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PREFACE 


$ 


This  report,  prepared  by  Gregory  B.  Baecher  of  NEXUS  Associates,  Wayland, 
Massachusetts,  with  assistance  from  D.  DeGroot,  and  c.  Erikson,  under  Contract 
DACW39-83-M-0067 ,  provides  details  for  the  statistical  analysis  of  geotechnical 
engineering  aspects  of  new  dam  projects.  It  was  part  of  work  done  by  the  US 
Army  Engineer  Waterways  Experiment  Station  (WES)  in  the  Civil  Works 
Investigation  Study  (CWIS)  sponsored  by  the  Office,  Chief  of  Engineers,  US 
Army.  This  study  was  conducted  during  the  period  October  1983  to  September 
1985  under  CWIS  Work  Unit  32221,  entitled  "Probabilistic  Methods  in  Soil 
Mechanics."  Mr.  Richard  Davidson  was  the  OCE  Technical  Monitor. 

The  report  is  an  introduction  to  practical  techniques  of  statistical  data 
analysis  for  use  in  geotechnical  engineering.  The  intended  audience  is  the 
practicing  geotechnical  engineer  with  little  or  no  background  in  statistics. 
Readers  with  a  developed  background  in  statistics  may  find  the  methodological 
presentation  rudimentary,  but  may  still  find  interest  in  the  numerical  examples 
which  come  from  actual  construction  projects.  Two  other  reports  were  prepared 
under  the  same  contract,  "Statistical  Quality  Control  for  Engineered 
Embankments,"  (Contract  Report  GL-87-2) ,  and  "Error  Analysis  for  Geotechnical 
Engineering,"  (Contract  Report  GL-87-3) ,  in  addition  to  a  final  report. 

Ms.  Vary  Ellen  Hynes-Grif f in,  Earthquake  Engineering  and  Geophysics 
Division  ^EEGD) ,  Geotechnical  Laboratory  (GL),  WES  was  the  Contracting 
Officer's  Representative  and  WES  Principal  Investigator  for  CWIS  Work  Unit 
32221.  General  supervision  was  provided  by  Dr.  A.  G.  Franklin,  Chief,  EEGD, 
and  Dr.  W.  F.  Marcuson  III,  Chief,  GL. 

Commander  and  Director  of  WES  during  the  publication  of  this  report  was 
COL  Dwayne  G.  Lee,  CE .  Dr.  Robert  W.  Whalin  was  Technical  Director. 

1  i  . 
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■STATISTICAL  ANALYSIS  FOR  GEOTECHNICAL  DATA 


PART  I :  INTRODUCTION 
Backq  round 


Traditionally,  the  planning  of  qeotechnical  site  characterization  and  the 
analysis  of  data  which  result  have  been  accomplished  by  ad  hoc  procedures. 

These  rest  primarily  on  intuition  and  visual  inspection  of  data.  Advances  in 
geotechnical  testing  and  modelinq  combined  with  stricter  regulatory  oversight 
have  led  to  changes  with  important  implications  for  site  characterization  and 
data  analysis.  Principal  among  these  are:  (a)  increased  numbers  and  quality 

of  geotechnical  data,  (b)  increased  concern  with  quality  assurance  in 
engineering,  and  (c)  increased  regulatory  interest  in  the  connection  between 
performance  assessments,  parameter  estimates,  and  supporting  data. 

At  the  same  time,  growing  experience  with  the  use  of  simple  statistical 
methods  in  geotechnical  engineering  has  provided  techniques  tailored  to  the 
special  needs  of  geotechnical  practice.  These  methods  provide  means  for 
accomodating  recent  changes,  and  for  improving  the  practice  of  geotechnical 
engineering.  Such  statistical  methods  are  well  suited  to  automatic  data 
processing;  they  provide  an  explicit,  repeatable  procedure  for  obtaining 
parameter  values;  and  they  allow  quantified  levels  of  confidence  to  be  assigned 
to  parameter  estimates. 


Purpose 

The  purpose  of  this  report  is  provide  potential  users  of  statistical 
methods  for  geotechnical  data  analysis  with  an  introduction  to  practical 
concepts,  definitions,  and  techniques.  The  report  is  not  exhaustive;  it 
intends  to  present  simple,  useful  techniques  in  sufficient  detail  that  a  reader 
not  already  conversant  with  statistical  theory  may  undertake  practical  analyses 
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of  geotechnical  data.  These  analyses  should  make  better,  more  powerful  use  of 
data  than  has  been  possible  with  ad  hoc  procedures,  and  should  provide 
estimates  of  uncertainty  in  engineering  parameters  to  serve  as  the  basis  for 
error  analysis  of  engineering  calculations.  This  report  complements  materials 
presented  in  "Error  analysis  for  geotechnical  engineering,"  (Contract  Report 
GL-87-3) ,  in  which  the  use  of  quantified  estimates  of  uncertainty  and  error  in 
geotechnical  modeling  is  discussed. 

General  Description  of  Statistical  Analysis 
The  approach  to  statistical  analysis  of  geotechnical  data  developed  in 
this  report  is  based  on  summarizing  a  parameter  value  by  two  numbers:  a  best 
estimate  and  a  measure  of  uncertainty.  The  'mean'  or  arithmetical  average  is 
used  for  the  first;  the  'standard  deviation1  or  root-mean-square  variation  is 
used  for  the  second.  These  and  other  statistical  terms  are  defined  as  they 
appear  in  later  sections.  Importantly,  the  methods  used  in  the  report  do  not 
require  restrictive  assumptions  on  the  shape  of  probability  distributions 
(e.g.,  the  assumption  of  Normal  distributions),  and  as  a  result  the  report 
considers  probability  distributions  with  only  passing  interest.  The  main 
concept  behind  the  approach  of  this  report  is  that  uncertainty  or  error  in 


E 


E 
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geotechnical  parameter  estimates  can  be  divided  into  four  types,  and  the 
importance  of  each  can  be  analyzed  individually.  The  ability  to  separately 
consider  each  principal  source  of  uncertainty  greatly  simplifies  the  task  of 
analyzing  data.  Once  each  source  of  uncertainty  has  been  considered 
individually,  explicit  rules  based  on  probability  theory  are  used  to  calculate 
the  overall  uncertainty  in  a  parameter  estimate. 


1  1 


The  four  types  of  uncertainty  in  a  geotechnical  parameter  estimate  are, 


(a)  actual  variability  in  the  soil  deposit,  (b)  random  measurement  error,  (c) 
measurement  bias,  and  (d)  limited  numbers  of  tests  (Fig.  1).  The  first  two 
cause  the  scatter  so  common  in  geotechnical  measurements.  The  last  two  cause 
systematic  errors  which  are  unrelated  to  location.  Each  of  these  sources  of 
uncertainty  affects  engineering  calculations  in  its  own  way  and  as  a  result 
should  be  analyzed  individually.  At  the  end,  the  four  uncertainties  are 
combined  to  construct  a  statistical  soil  profile.  The  statistical  profile 
shows  the  best  estimate  profile  of  soil  properties  with  depth,  and  provides 
uncertainty  envelopes  about  that  profile.  The  statistical  design  profile  is 
the  first  step  in  error  analysis,  as  described  in  the  accompanying  report, 
"Error  analysis  for  geotechnical  engineer inq , "  (Contract  Report  GL-87-3). 


Organization  of  This  Report 

This  report  is  organized  in  five  parts.  After  the  Introduction,  Part  II 
summarizes  common  techniques  for  summarizing  data  using  statistical 
descriptions.  Part  III  introduces  techniques  for  modeling  and  summarizing  the 
spatial  character  of  soil  property  data  and  the  means  for  establishing  the 
amount  of  measurement  error  in  observed  data  scatter.  Part  IV  addresses 
systematic  or  bias  errors  in  measurements  and  in  models.  Finally,  part  V  puts 
the  techniques  for  Parts  II,  III,  and  IV  together  to  summarize  a  soil  profile 


PART  II:  DESCRIBING  SOILS  DATA 


Engineering  data  on  soils  properties  are  usually  scattered.  Graphical  and 
simple  mathematical  techniques  are  useful  in  summarizing  this  scatter  so  that  a 
better  understanding  of  the  data  can  be  developed.  For  the  present  purposes, 
such  graphical  and  mathematical  techniques  are  used  to  obtain,  (a)  best 
estimates  of  soil  engineering  properties,  and  (b)  quantitative  assessments  of 
the  uncertainty  or  error  in  such  estimates. 

Histograms  and  Frequency  Distributions 

Histograms  and  frequency  distributions  are  graphical  descriptions  of  the 
variability  or  scatter  of  data.  Plotting  a  histogram  or  frequency  distribution 
is  usually  the  first  step  in  data  analysis. 

Histograms 

A  histogram  is  a  diaqrammatic  representation  of  the  frequency  with  which 
measurements  lie  within  specified  intervals  of  magnitude.  For  example,  Fig.  2a 
shows  a  histogram  of  standard  penetration  test  (SPT)  blow  count  data  within  a 
single  stratum  of  silty  alluvial  sand.  The  intervals  along  the  horizontal  axis 
of  the  histogram  are  each  of  the  same  width,  and  the  height  of  the  bars  shows 
the  frequeny  of  data  lying  within  each  interval.  Since  the  intervals  are  all 
of  the  same  width,  the  area  of  each  bar  is  also  proportional  to  the  frequency 
of  data  within  that  interval. 

A  histogram  is  a  convenient  way  of  displaying  data  since  many  important 
features  are  immediately  apparent  in  diagrammatic  form.  For  example,  the  data 
of  Fiq .  2a  are  seen  to  vary  about  a  central  peak  at  about  9  blows/ft.  The  data 
are  more  or  less  symmetric  about  this  peak,  and  data  which  vary  subs  cant ia lly 
from  the  peak  are  infrequent.  The  bulk  of  the  data  lies  within  an  interval 
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approximately  between  3  to  15  blows/ft,  with  extreme  values  ranginq  from  0  to 
24  blows/ft.  A  symmetric  distribution  of  data  like  Fiq .  2a  is  often  described 
as  bell-shaped. 

A  histogram  of  another  set  of  cone  pentration  test  data  is  shown  in  Fig. 
2b.  These  data  are  not  symmetric  about  their  peak  frequency.  The  largest 
frequency  occurs  near  the  lower  end  of  the  scale,  and  while  the  frequencies 
decline  on  both  sides  of  the  peak,  they  do  so  more  slowly  on  the  upper  side, 
that  is,  as  penetration  resistance  increases.  Such  distributions  are  said  to 
be  skewed. 

To  construct  a  histogram  the  following  procedure  is  used: 

1.  Divide  the  horizontal  axis  of  the  graph  into  about  5  to 
10  intervals  of  constant  width. 

2.  Count  the  number  of  data  having  values  within  each 
interval. 

3.  Plot  this  number  as  a  vertical  bar  above  the  appropriate 
interval. 

About  5  to  10  intervals  are  used  because  this  number  typically  allows  a 
sufficient  number  of  data  in  each  interval  for  the  observed  frequencies  to  vary 
smoothly,  and  yet  provides  adequate  definition  of  the  shape  of  the  distribution 
of  data.  For  small  numbers  of  data  a  convenient  rule-of-thumb  for  choosing 
the  number  of  intervals  is 

k  =  1  +  3.3  log  i  o  n  ,  ( 1  ) 

in  which  n  =  the  number  of  data  values  and  k  (rounded  to  the  next  higher 
integer)  =  the  number  of  intervals  (Sturges,  1926).  The  choice  of  number  of 
intervals  can  affect  the  visual  interpretation  of  data  scatter.  Thus,  it  is 


sometimes  useful  to  construct  more  than  one  histogram,  using  a  different  number 
of  intervals  on  each  plot  in  order  to  obtain  an  intuitive  feel  for  the  data 
scatter.  This  problem  is  circumvented  by  using  a  frequency  distribution,  as 
described  below. 

Usually,  it  is  convenient  to  specify  interval  boundaries  to  one  fewer 
decimal  places  than  that  to  which  the  data  are  measured,  avoiding  the  problem 
of  where  to  place  values  falling  directly  on  an  interval  boundary.  When  this 
is  not  possible  some  consistent  procedure  should  be  adopted  for  deciding  how  to 
count  data  which  fall  directly  on  an  interval  boundary.  For  example,  any  value 
lying  on  a  boundary  might  be  automatically  counted  in  the  lower  interval.  Some 
people  prefer  to  allocate  1/2  unit  to  each  adjacent  interval.  This  is  an 
acceptable  procedure  but  it  leads  to  noninteger  frequencies  which  may  be 
awkward. 


Frequency  Distributions 

A  frequency  distribution  is  obtained  by  changing  the  vertical  axis  from 
the  frequency  of  data  within  class  intervals  to  the  cumulative  fraction  of  data 
less  than  a  particular  value.  The  frequency  distribution  is  a 
f raction-less-than  (or  percent-less-than)  curve.  Fig.  3  shows  the  frequency 
distribution  for  the  SPT  data  of  Fig.  2a. 

To  construct  a  frequency  distribution  the  following  procedure  is  used: 

1.  Arrancie  the  data  in  ascending  order,  xi.xj . x^ . xn  . 

2.  For  each  value  x^  ,  calculate  the  frequency  of  data  less 
than  or  equal  to  that  value,  f^=i/n.  For  the  largest  value, 
assign  the  frequency  fn=n/n+1. 

3.  Plot  the  value  of  the  data  x^  along  the  horizontal  axis  and 
its  corresponding  cumulative  frequency  f^  along  the  vertical 
axis. 
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The  advantages  of  the  frequency  distribution  are  that  it  does  not  require 
data  to  be  grouped  into  arbitrary  numbers  of  intervals  and  the  fraction  of 
data  less-than  or  greater-than  any  value  can  be  immediately  read  from  the 
graph.  The  disadvantage  is  that  the  shape  of  the  distribution  of  data  is  not 
as  clearly  apparent  in  a  frequency  distribution  as  in  a  histogram. 

Probability  Paper 

Probability  paper  is  graph  paper  with  special  grids  designed  such  that  the 
cumulative  frequencies  of  particular  types  of  frequency  distributions  plot  as 
straight  lines.  Fig.  4  shows  the  data  of  Fig.  2a  plotted  on  Normal 
probability  paper.  Normal  probability  paper  causes  bell-shaped  distributions 
(more  precisely,  Normal  distributions)  to  plot  as  straight  lines.  Other  types 
of  probability  paper  are  also  available.  In  this  report  little  use  is  made  of 
the  mathematical  shape  of  the  frequency  distributions  of  data.  Nevertheless, 
probability  papers  are  commonly  encountered  in  practice  and  in  statistical 
software,  and  are  often  a  convenient  way  to  plot  data. 

Mean  and  Standard  Deviation 

Graphical  descriptions  of  the  variability  among  data  are  useful  for 
obtaining  a  feelinq  for  the  scatter  in  a  particular  data  set,  but  for 
engineering  applications  a  mathematical  description  of  data  scatter  is  usually 
needed.  This  is  conveniently  provided  by  the  mean  and  standard  deviation.  The 
mean  is  a  quantitative  measure  cf  the  central  location  of  the  scatter  of 
measurements  along  the  x-axis.  The  standard  deviation  is  a  quantitative 
measure  of  the  dispersion  of  the  measurements.  Together,  the  mean  and  standard 


deviation  summarize  important  information  about  the  distribution  of  measured 


values,  and  provide  a  useful  description  of  data  scatter  for  use  in  analysis. 


The  mean  of  a  set  of  measurements  x^  ,  i=1 . n,  is  the  arithmetic 


average , 
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mx  =  —  Xi 

x  n  i=1  1 
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The  mean  is  the  center  of  gravity  of  the  data  along  the  X-axis.  In  this 


report,  the  mean  is  used  as  the  best  estimate  of  a  soil  parameter  because  it  is 


neither  conservative  nor  unconservative.  In  some  references  the  mean  is  called 


the  expected  value  of  x  and  denoted  E[x],  but  this  expression  is  not  used 


here.  In  Fig.  2a  the  mean  of  the  histogram  of  the  SPT  data  is  8.9  blows/ft. 


Standard  Deviation 


The  standard  deviation  measures  the  variability  of  data  about  their  mean. 


Mathematically,  the  standard  deviation  is  the  square  root  of  the  sum  of 


squares  of  the  difference  between  each  measurement  and  the  mean, 


s  =  /  1  _  .  .?  =  "standard  deviation' 

x  j  - r  £  (x  -  m  )c 

i  n-1  i  x 


For  the  histogram  of  SPT  data  in  Fig.  2a,  the  standard  deviation  is  4.4  blows 


per  ft.  The  standard  deviation  can  be  thought  of  as  the  square  root  of  the 


moment  of  inertia  of  the  data  about  the  mean.  Whereas,  the  mean  describes  the 


center  of  the  data  along  the  X-axis,  the  standard  deviation  describes  the 


spread.  The  mean  and  standard  deviation  are  measured  in  the  same  units  as  the 


data  themselves.  The  denominator  (n-1)  is  used  in  F.qn.  3  rather  than  n  because 
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the  mean  mx  that  appears  in  the  Eqn.  3  is  itself  also  estimated  from  the  data. 
Thus,  to  the  extent  that  mx  differs  slightly  from  the  real  mean  of  x  in  a  soil 
deposit,  the  variations  (x^-nx)2  are  on  average  slightly  smaller  than  the 
corresponding  variations  about  the  real  mean.  Mathematically,  the  squared 
variations  are  on  average  too  small  by  the  factor  (n-1)/n,  and  thus  the 
denominator  in  Fqn.  3  corrects  for  this  bias. 

For  a  bell-shaped,  or  Normal,  distribution  of  data  the  mean  occurs  at  the 
0.5  fractile.  The  0.5  fractile,  denoted  xg  ^ ,  is  that  value  of  x  which  splits 
the  data  into  two  sets,  half  smaller  and  half  larger.  50%  of  the  data  are 
smaller  than  xg  .  The  value  Xg_cj  is  commonly  called  the  median.  Aqain,  for  a 
Normal  distribution  the  mean  plus  one  standard  deviation  occurs  at  the  0.84 
fractile;  the  mean  minus  one  standard  deviation  occurs  at  the  0.16  fractile. 
This  can  be  determined  from  tables  of  the  Normal  distribution  which  are  found 
in  most  statistics  textbooks  (e.g.,  Benjamin  and  Cornell,  7969),  When  data 
plot  as  a  line  on  Normal  probability  paper,  the  mean  and  standard  deviation  can 
be  readily  estimated  by  fitting  a  line  to  the  data  and  determining  the  values 
of  x  which  correspond  to  the  0.16,  0.5,  and  0.84  fractiles.  Denoting  these 
x0 . 1 6  >  x0 . 5  >  and  x0 . 84  » 

mx  ■  xQ>5  ,  (4) 

X0.84  “  X0.16 

sx  "  - ^ -  •  (5) 

In  calculations  it  is  sometimes  convenient  to  deal  with  sx2  rather  than 
sv,  just  as  in  mechanics  it  is  convenient  to  deal  with  the  moment  of  inertia 


rather  than  its  square  root.  The  square  of  the  standard  deviation  is  called 
the  variance,  and  is  exacatly  equivalent  to  the  moment  of  inertia  in 
mechanics.  The  variance  in  the  moment  of  inertial  of  the  data  about  the  mean 
mx, 

Vx  =  sx2  =  "variance"  .  (6) 

The  variance  of  the  data  in  Fig.  2a  is  (4.4  blcrws/ft)2  =  19.4  (blows/ft)2.  The 
variance  is  measured  in  the  square  of  the  units  of  the  data.  If  the  data  are 
measured  in  blows/ft,  the  variance  is  measured  (blows/ft)2.  Given  their 
similarity  to  mechanical  moments,  the  mean  and  variance  are  often  called 
(statistical)  moments  of  the  data.  The  mean  is  the  first  moment  about  x=0. 

The  variance  is  the  second  moment  about  x=mx.  A  description  of  soil  properties 
using  only  means  and  standard  deviations  is  said  to  be  a  second-moment 
description . 

Coefficient  of  Variation 

The  ratio  of  the  standard  deviation  to  the  mean,  or  the  proportional 
variability,  is  called  the  coefficient  of  variation, 

Qx  =  sx/mx  =  "coefficient  of  variation"  .  (7) 

The  coefficient  of  variation  of  the  data  in  Fig.  2a  is  S2X  =■  (4.4  blows/ft/B.9 
blows/ft)  -  0.49,  and  could  be  expressed  as  a  percentage  (i.e.,  49%). 

Correlation 

For  two  or  more  soil  properties,  variations  in  different  properties  may 
be  associated  with  one  another.  That  is,  variations  may  n«'t  be  independent. 

For  example,  the  water  content  and  undrained  strength  of  clays  are  known  to  be 
associated  with  one  another.  Thus,  variations  in  water  content  and  undrained 


strenqth  are  not  independent,  they  depend  on  one  another  through  causal 
mechanical  factors. 

Soil  properties  or  engineering  parameters  may  also  be  associated  with  one 
another  not  by  a  causal  mechanical  factor  but  by  the  way  they  are  measured  or 
estimated.  For  example,  triaxial  compression  tests  miqht  be  performed  to 
estimate  the  effective  strenqth  parameters  (c,,'t>1)  of  the  Mohr-Coulomb  strength 
criterion.  If  c1  and  ■>  1  are  estimated  by  fitting  a  line  to  the  resulting  Mohr 
circles,  error  can  be  introduced  by  the  way  the  envelope  is  fit.  An  envelope 
drawn  too  flat,  leads  to  a  *  which  is  too  small.  An  envelope  drawn  too 
steep  leads  to  a  >  •  which  is  too  large.  However,  if  an  envelope  is  drawn 
too  flat,  then  for  the  envelope  to  still  fit  the  data,  the  cohesion  intercept 
c'  must  be  made  larger  than  it  should  be.  Conversely,  if  the  envelope  is  drawn 
too  steep,  the  cohesion  intercept  must  be  made  smaller  than  it  should  be  to 
still  fit  the  data.  Errors  in  the  estimates  of  and  c1  are  associated  with 
one  another. 

The  strength  of  association  between  soil  parameters  is  expressed  by  the 
correlation  coefficient, 


^  x^— m^  y i-my 

=  -  £  { - If - )  =  "correlation  coefficient"  ,  (8) 

n  s  s  ' 

x  y 


in  which  mx  and  my  =  the  means  of  x  and  y,  respectively;  sx  and  Sy  =  the 
respective  standard  deviations  of  x  and  y.  The  two  terms  within  the  summation 
are  the  deviations  of  x  and  y  measured  in  units  of  their  respective  standard 
deviations.  That  is,  they  are  standardized  dimensionless  deviates.  Thus  the 
correlation  coefficient  is  a  non-dimensional  measure  of  the  degree  to  which  two 
parameters  vary  together. 
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The  range  of  rw  is  -1<r„.<+1: 

Ay  xy  xy 


=  +1  indicates  a  perfect  linear  relation 


between  x  and  y  having  positive  slope,  r^  =  -1  indicates  a  perfect  linear 
relation  between  x  and  y  having  negative  slope,  r^  =  0  indicates  no  relation 
between  x  and  y.  When  r^  =0,  x  and  y  are  said  to  be  independent  and  the 
scatter  diagram  of  y  plotted  against  x  shows  no  trend. 

If  the  variations  of  x  and  y  are  not  normalized  by  their  respective 
standard  deviations,  the  covariance  is  obtained, 


C 


*,y 


Z  (x^-mx ) (y £-my )  =  "covariance" 


(9) 


The  covariance  is  not  dimensionless.  From  Eqns.  8  and  9, 

cx,y  =  (sxSy)  r^  .  (10) 

Fig.  5  shows  a  scatter  plot  of  compaction  control  data  collected  during 
the  construction  of  an  engineered  fill.  Compaction  water  content  is  plotted 
along  the  X-axis;  compacted  dry  density  is  plotted  along  the  Y-axis.  Each 
point  corresponds  to  one  test  in  which  both  water  content  and  dry  density  were 
measured.  As  should  be  expected,  water  content  and  dry  density  are,  on 
average,  inversely  related  to  one  another.  The  correlation  coefficient  for  the 
data  of  Fig.  5  calculated  using  Eqn.  8  is  r^  =  -0.7. 

For  comparison,  Fig.  6  shows  scatter  plots  of  x,y  having  various 
coefficients  of  correlation.  When  rXy>0  the  data  cloud  slopes  upward  to  the 
right.  An  intuitive  feel  can  be  obtained  by  thinking  of  a  vertical  line 
through  nx  and  a  horizontal  line  through  Hy  dividing  the  scatter  diagran  into 
four  quadrants.  In  the  upper  right  quadrant  both  (x^-mx)  and  (y^-my)  are 
positive,  thus  their  product  is  positive.  In  the  lower  left  quadrant  both 
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(xi-mx)  and  (y^-iriy)  are  neqative,  thus  again  their  product  is  positive.  In  the 
other  two  quadrants  the  products  are  negative.  Therefore  any  cloud  of  data 
which  has  nost  of  its  points  in  the  upper  right  and  lower  left  quadrants  has  an 
rXy>0.  Conversely,  any  cloud  with  nost  of  its  points  in  the  lower  riqht  and 
upper  left  quadrants  has  an  rXy<0.  If  the  points  fall  equally  in  all  four 
quadrants,  rXy  ~  0.  It  is  important  to  note  that  the  correlation  coefficient 
is  a  measure  of  linear  association.  Two  parameters  nay  be  deterministically 
related,  but  non-1 inearly ,  and  have  an  rxy  other  than  tl. 

Means  and  Standard  Deviations  of  Calculated  Parameters 

Means  and  standard  deviations  are  used  above  to  describe  best  estimates 
and  uncertainties  about  measured  properties.  Correlation  coefficients  are 
used  to  describe  association  among  properties  or  among  uncertainties  in 
properties.  For  engineering  analysis,  measured  properties  are  sometimes 
transformed  na thena t ica lly  to  obtain  desired  input  parameters  for  engineering 
models.  Deformation  might  be  used  to  calculate  elastic  moduli,  or  in  situ 
stresses  and  measured  strengths  might  be  used  to  calculate  normalized  soil 
properties. 

The  mathematics  needed  for  relating  a  second-moment  description  of  soil 
properties,  loads  or  other  measurements  to  a  corresponding  second-moment 
description  of  calculated  results  are  relatively  unconpl icated.  Some  equation 


is  chosen  for  calculating  the  results  of  interest.  For  example,  to  calculate 
elastic  modulus  from  stress  and  strain  measurements  the  equation  would  be 
F.  =  lA,  in  which  d  =  stress  and  1  =  strain.  Next,  means,  standard  deviations, 

and  correlation  coefficients  are  evaluated  for  all  the  input  parameters.  In 
the  example,  the  input  parameters  would  be  stress  and  strain,  and  the 


corresponding  statistical  moments  would  be  m0 ,  me ,  sa ,  sc  ,  and  rae.  Then  these 
means,  standard  deviations  and  correlation  coefficients  are  used  in  conjunction 
with  the  equation  to  determine  resulting  means,  standard  deviations,  and 
correlation  coefficients  (if  applicable)  on  the  calculated  result(s).  In  the 
example,  the  result  is  the  scalar  value  E. 

Mean  of  a  Calculated  Parameter 

Operationally,  mean  soil  properties  are  propagated  through  an  equation 
using  a  first-order  approximation.  This  is  a  linear  approximation  in  the 
vicinity  of  the  best  estimates  of  the  soil  properties.  Mathematically,  the 
calculation  of  some  result  y  based  on  a  soil  parameter  x  can  be  expressed  as  a 
function, 

y  =  g ( x )  (11) 

By  taking  a  Taylor's  series  expansion  of  q(x)  at  the  point  mx  and  then 
truncated  all  but  the  first  two  (i.e.,  linear)  terms,  the  tangent  at  mx  is 
obtained  (Fig.  7).  For  most  geotechnical  purposes  this  linearization  is 
sufficiently  accurate.  For  strongly  nonlinear  cases,  other  methods  are 
available.  These  are  discussed  in  the  report,  "Error  analysis  for  geotechnical 
engineering,"  (Contract  Report  ■  ,1,-87-  3)  .  Applying  rudimentary  probability 
theory  leads  to  the  convenient  result, 

n  =  g ( m  )  ,  (12) 

y  x 

in  which  =  indicates  first-order  approximation.  in  words,  the  mean  or  best 
estimate  of  the  result  v  is  the  function  of  the  mean  or  best  estimate  of 


the  parameter  x. 


This  is  the  common  deterministic  .solution,  using  the 


*14 

•W 


also  be  propagated  through  an  equation  y=g(x)  to  find  a  corresponding  standard 
deviation  on  the  calculated  parameter  y.  The  first-order  approximation  leads 
to  the  relation 


s  =  f^-)  s 
y  dx  x 


in  which  the  derivative  dy/dx  can  be  thought  of  as  an  influence  factor.  In 

words,  the  standard  deviation  of  the  prediction  y  is  the  product  of  the 

standard  deviaton  of  the  parameter  x  and  an  influence  factor  equal  to  the 

derivative  of  y  with  respect  to  x.  For  modulus  calculated  from  an  uncertain 
stress  but  known  strain,  sE  =  (dE/da)s0.  The  relation  is  exact  when  g(x)  is 

linear. 

When  the  prediction  y  depends  on  a  set  of  parameters,  x  =  { x-|  ,  .  .  .  ,xn }  ,  the 
equivalent  forms  of  Eqn.  12  and  13  are, 


nr,  g  (mx  ,  •  •  .  ,mx  ) 
1  n 


S  2  i  £  £  c 

y  dx^  dxj  X£>X. 


Note,  when  the  x^,Xj  are  independent,  Cx  x  =0  for  i*j  and  Cx  x  =sx  -=VX  for 

i  j  i  ’  i  i  i 

i=j,  thus, 
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The  example  calculation  of  modulus  from  both  an  uncertain  stress  and  an 


uncertain  strain  is  carried  out  in  Plate  1.  Two  special  cases  deserve  note 
because  they  are  common  in  practice  and  lead  to  simple  results.  For  the  case 
in  which  y  is  a  linear  combination  of  a  set  of  independent  parameters  y  =  Ea^x^ 
the  variance  of  y  is  exactly, 

Vy  =  Z  a*2  Vxi  (17) 

For  the  case  in  which  y  is  a  power  function  (product)  of  a  set  of  independent 
parameters,  y  t^ie  coefficient  of  variation  of  y  is  approximately, 

1+.Qy2  =  Z  (1+ai2^2)  (18) 

which  for  small  coefficients  of  variation  (e.g.,  less  than  30%)  reduces  to, 

ft2y  =  Z  ai2  Q2xi.  (19) 

Regression  Analysis 

When  two  soil  properties  or  parameters  are  associated  with  one  another, 
their  correlation  coefficient  can  be  used  to  predict  one  property  or  parameter 
from  the  other.  This  is  done  with  regression  analysis.  Regression  analysis 
is  used  to  fit  lines  or  curves  to  data.  For  example,  regression  analysis  can 
be  used  to  estimate  undrained  strength  of  a  saturated  clay  from  water  content. 

The  common  criterion  for  fitting  trend  lines  or  curves  to  data  is  by 
minimizing  the  sum  of  squared  residuals  off  the  trend.  This  is  called  the 
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least-squares  fit.  The  cone  penetration  resistance  data  shown  in  Fig.  8  appear 
to  increase  more  or  less  linearly  with  depth.  Mathematically,  this  trend  in 
the  data  can  be  expressed  as, 

y  =  a  +  bx  (20) 

in  which  y  =  undrained  strength,  x  =  log  water  content,  and  a  and  b  are 
constants.  The  constant  a  is  the  intercept  at  x  =  0;  b  is  the  slope. 

The  problem  of  trend  fitting  is  to  estimate  the  coefficients  a  and  b  from 
a  set  of  n  data  pairs  (y^.x^)  such  that  the  resulting  trend  line  is  'best.1 
Under  the  least-squares  criterion  a  and  b  are  estimated  such  that  the  sum  of 
the  squared  residuals  in  the  y-direction,  uj= Ty ^- ( a+bx^ ) ] 2 ,  is  minimized.  The 
values  of  a  and  b  which  minimize  the  sum  of  squared  residuals  provide  the  best 
prediction  of  y  for  a  given  x,  and  can  be  shown  to  be  (Benjamin  and  Cornell, 
1979)  , 

,  (Ex.2)(Ey.)  -  (Ex  )(Ex  y.) 

a  =  - 2 - - - - - ’  (21) 

n  (Ex^2 )  -  ( Ex^ ) 2 

n(Ex  .y  )  -  (Ex  ) (Ex.y  ) 

b  -  - LU - i - £i_  .  ,22) 

n ( E x^2 )  -  ( Ex£ ) 2 
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The  variance  of  the  residuals  is 


£  [yi  -  (a+bxi )  ]" 


(23) 


h 

•i 


3 

* 

< 

\ 


This  best  fitting  line  to  the  data  of  Fig.  8  is  shown  in  the  Fig.  The  two 
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envelopes  about  the  line  are  +/-  one  standard  deviation  su  =  /vu, 


a 


m 


•a 


upper  envelope:  y  =  a  +  bx  +  su 


lower  envelope:  y  =  a  +  bx  -  su 


In  regression  analysis  the  best  fitting  line  is  chosen  to  be  that  which 
minimizes  squared  deviations  of  data  in  the  y-direction  (i.e.,  vertically). 

This  is  the  line  which  gives  the  best  estimate  of  y  for  a  known  value  of  x.  If 
the  reverse  prediction  were  desired,  that  is  the  best  estimate  of  x  for  a  known 
value  of  y,  then  a  different  regression  line  would  give  the  best  result.  To 
predict  x  from  y  the  best  line  is  that  which  minimizes  squared  deviations  of 
data  in  the  x-direction.  This  is  found  by  interchanging  x's  and  y's  in  Eqns. 

20  through  23. 

Non-linear  trends  are  fit  to  data  in  much  the  same  way  as  lines  are, 
Typically,  a  direct  least  squares  fit  is  used,  sometimes  after  a 
transformation  of  the  data  to  fit  a  linear  model.  For  example,  exponential  or 
power  functions  can  be  transformed  through  the  logarithm, 


y  =  ax1 


In  y  =  In  a  +  b  In  x 


and  then  a  linear  regression  fit  to  lny:lnx.  This  is  a  common  approach, 
although  statisticians  usually  warn  that  a  transformation  of  data  such  as  this 
implicitly  alters  some  statistical  assumptions  underlying  regression  analysis 
(Snedecor  and  Cochran,  1980).  For  example,  with  linear  regression  analysis  the 
scatter  of  the  data  about  the  best  fitting  line  is  assumed  to  be  the  same  all 
along  the  line.  If  regression  analysis  is  applied  to  the  logarithm  of  the  data 


and  the  same  equations  21  and  22  are  used  to  estimate  reqression  coefficients, 


then  the  scatter  of  the  logarithm  of  the  data,  not  the  data  themselves,  is 
implicitly  assumed  to  be  the  same  all  along  the  line.  In  many--but  not 
all--cases  the  transformation  of  a  non-linear  relation  to  a  linear  one  causes 
few  difficulties. 

Shortcut  Estimates 

In  a  number  of  situations  faced  in  the  field,  quick  but  only  approximate 
estimates  of  means,  standard  deviation,  or  correlation  coefficients  are  desired 
from  limited  numbers  of  data.  Shortcut  techniques  are  available  for  this 
purpose.  These  provide  savings  of  time  and  effort  while  often  causing  only 
minor  losses  of  precision. 

Shortcuts  for  Estimating  the  Mean 

An  easy,  quick,  and  often  qood  estimate  of  the  mean  can  be  obtained  from 
the  median.  The  median  is  the  middle  value  of  a  data  set.  It  is  that  value 
which  is  larger  than  half  the  measurements  and  smaller  than  the  other  half. 

For  example,  if,  say,  five  data  are  listed  in  ascending  order  6,9,10,12,15,  the 
median  is  10.  For  an  even  number  of  data,  say  6,9,10,12,15,16  the  difference 
between  the  two  middle  data  is  halved  to  give  the  median,  that  is  ( 1 0+1 2)/2=1 1 . 
For  data  scatter  which  is  symmetric  about  its  central  value  and  for  small 
numbers  of  data,  the  sample  median  is  actually  a  good  estimate  of  the  mean.  On 
the  other  hand,  if  the  data  scatter  is  asymmetric--f or  example,  if  there  are 
many  small  values  and  a  few  large  values — the  sample  median  is  not  a  good 
estimator  of  the  mean. 

A  second  shortcut  for  estimating  the  mean  is  by  taking  one-half  the  sum 
of  the  largest  and  smallest  measured  values,  (1/2)(xnax  +  xmin).  This 
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estimator  is  sensitive  to  the  extreme  values  in  a  set  of  measurements,  and 
thus  fluctuates  considerably.  It  is  not  a  good  shortcut  estimator  and  should 
only  be  used  with  caution. 

Shortcuts  for  Estimating  the  Standard  Deviation 

A  useful  estimator  of  the  standard  deviation  from  small  numbers  of  tests 
is  the  sample  range  wx= !  xmax-xmin !  *  ran<3e  is  span  of  data  from 

largest  to  smallest.  Like  the  standard  deviation,  the  range  is  a  measure  of 
dispersion  in  a  set  of  data.  However,  the  relationship  between  the  standard 
deviation  and  the  sample  range,  on  average,  depends  on  how  many  tests  are  made. 
To  obtain  a  best  estimate  of  sx  from  the  range  of  data  wx  a  multiplier  Nn  is 
used  which  depends  on  sample  size  (Table  1).  The  best  estimate  of  the  standard 
deviation  is  sx  a  Nnwx  (see  Plate  2). 

As  for  the  sample  median,  the  range  is  a  good  estimator  of  the  standard 
deviation  for  small  n  and  symmetric  data  scatter.  Even  for  modest  n  it  remains 
fairly  good.  However,  for  asymmetric  data  scatter  the  range,  which  is  strongly 
affected  by  outliers,  is  not  a  good  estimator  of  the  standard  deviation. 
Fortunately,  with  the  notable  exception  of  hydraulic  parameters  such  as 
permeability,  most  geotechnical  data  display  symmetric  scatter.  In  the  case  of 
hydraulic  permeability  data  a  logarithmic  transformation  usually  makes  the  data 
scatter  symmetric,  and  again  the  median  and  range  become  convenient 
estima  tors . 

Shortcuts  for  Estimating  the  Correlation  Coefficient 

Calculation  of  correlation  coefficients  by  Eqn.  R  can  be  tedious.  A 
simple  and  quick  approximation  is  obtained  graphically  from  the  shape  of  the 
scatter  plot  of  y  vs.  x.  The  method  works  well  whenever  the  outline  of  the 
scatter  plot  is  approximately  elliptical,  and  works  even  with  small  numbers  of 


data.  Osinq  Chatillon's  (1984)  term  and  procedure,  this  is  called  the  balloon 
method: 


STEP  1:  Plot  a  scatter  diaqram  of  y  vs.  x. 

STEP  2:  Draw  an  ellipse  (balloon)  surrounding  all  or  most  of  the 
points  on  the  plot. 

STEP  3:  Measure  the  vertical  height  of  the  ellipse  at  its  center,  h, 
and  the  vertical  height  of  the  ellipse  at  its  extremes,  H. 

STEP  4:  Approximate  the  correlation  coefficient  as:  r^  =  / 1  -  (h/H ) 2  . 

An  example  of  the  method  is  shown  in  Fig.  9.  For  these  data  the  balloon  method 
gives  a  correlation  coefficient  of  0.81,  whereas  the  correlation  coefficient 
calculated  by  Eqn.  8  is  0.83.  Empirically,  the  method  works  well  for  rJfy>0.5. 

Shilling  (1984)  has  suggested  a  similar  method  for  approximately 
estimating  the  correlation  coefficient.  The  principal  difference  from 
Chatillon's  method  is  that  the  data  are  normalized  by  their  standard  deviation 
before  being  plotted: 

STEP  1:  Plot  a  scatter  diaqram  of  (y-my)/sx  vs.  (x~mx)/sy. 

STEP  2:  Draw  an  ellipse  surrounding  all  or  most  of  the  points  on  the 
plot. 

STEP  3:  Measure  the  length  of  the  principal  axis  of  the  ellipse 

having  positive  slope,  D,  and  the  length  of  the  principal 
axis  of  the  ellipse  havinq  negative  slope,  d. 

STEP  4:  Approximate  the  correlation  coefficient  as  r^  = (D2 -d ’ )/ (D2+d 2  ) . 

This  methods  works  about  as  well  as  Chatillon's.  For  the  data  of  Fig.  9 
Shilling's  method  gives  r^  =  0,80. 


Table  1 

Multiplier  for  Estimating  Standard  Deviation  from  Sample  Range 
(from  Snedecor  and  Cochran,  1980) 


■s 


sx  3  Nn 

(xmax  "  xmin' 

n 

Multiplier  Nn 

n 

Multiplier 

2 

0.886 

12 

0.815 

3 

0.591 

13 

0.300 

4 

0.486 

14 

0.294 

5 

0.430 

15 

0.288 

6 

0.395 

16 

0.283 

7 

0.370 

17 

0.279 

8 

0.351 

18 

0.275 

9 

0.337 

19 

0.271 

10 

0.325 

20 

0.268 

1  1 

0.315 
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Table  1.  Multiplication  Factors  for  Estimating  Standard  Deviation 

from  The  Range  of  Sample  Data  (After  Snedecor  and  Cochran, 
1 980 ) . 
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PROBLEM:  Calculate  constrained  modulus  from  laboratory  measurements  of 

stress  and  strain 

SOLUTION: 

DATA:  Initial  stress  0o  =  60  psi 

Stress  increment  Aa  =  5  psi 
Measured  strain  e  =  0.096 

BEST  ESTIMATE  OF  CONSTRAINED  MODULUS: 

E  =  0/C 

mE  =  mo/ me 

=  5  psi  /  0.096  =  52  os  l . 

UNCERTAINTY  (STATNDARD  DEVIATION)  OF  MODULUS: 

sE  =  (dE/d0)2  sAo2  +  (dE/de)2  se2 

=  (i/c)2  s^02  +  ( —o / c 2 ) 2  se2 

=  (1/0. 096)2  (0.5  psi)2  +  (-5  psi/0.0962)2  (0.01)2 

=  (7.5  psi)2 

X  i 

3  S2  j_  7  .  r>ps  i 


30o  =  2  psi 

’Aa  =  °*5  Psi 
S  c  =  0.01 


St  rcss 


PLATE  2 


SUBJECT:  Shortcut  estimates  of  summary  parameters. 


A]  DATA: 


Test  Number 


1 

2 

3 

4 

5 

6 

7 

8 
9 


Measured  Strength 
_ (kPa) 

38 
51 
43 

39 

48 
45 
42 
45 

49 


B]  ESTIMATE  MEAN: 
Dy  Equation  2 


=  (400  kPa ) 

=  44.4  kPa 


Shortcut  Method  Using  Median 

it^  *»  median  of  x^ 

=  45  kPa 


C]  ESTIMATE  STANDARD  DEVIATION: 
By  Equation  3 


s 


x 


1 

n- 1 


£  ( x i-mx ) ^ 


=  4.2  kPa 


Shortcut  Method  Using  Range 


w  _  (xmax  “  xmin^ 

=  51-38  kPa 

=  13  kPa 

sx  K  Nn  wn 

From  Table  1 ,  Ng  =  0.337 
=  (0.337)  (13) 

=  4 . 4  kPa 
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Figure  1.  Sources  of  Error  or  Uncertainty 
in  Soil  Property  Estimates. 
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Histoqranis  of  Soil  Property  Data:  (a)  Symmetric  Distribution 
of  Variability;  (b)  Skewed  Distribution  of  Variability. 
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Figure  3.  Frequency  Distribution  of  SPT  Data  From  Fig 
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i'lqure  4.  Probability  Paper  Plot  of  '.'onpacfion  Control  Data , 
Middle  line  shows  best  tit  to  data;  outside  lines  show  slat  is 
ifoodness  of  t  it  (to'  I  nuqtorov)  hounds. 
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Figure  7. 


First-Order  Propagation  of  Error  Through  The  Model  y 


PART  III:  SPATIAL  VARIATION  AND  DATA  SCATTER 


Soils  are  geological  materials  formed  by  weathering  processes  and,  except 
for  residual  soils,  are  transported  by  physical  means  to  their  present 
locations.  They  have  been  subject  to  various  stresses,  pore  fluids,  and 
physical  and  chemical  changes.  Thus,  it  is  hardly  surprising  that  the  physical 
properties  of  soils  vary  from  place  to  place  within  resulting  deposits. 

The  scatter  observed  in  soil  data  comes  both  from  this  spatial  variability 
and  from  errors  in  testing.  Each  of  these  exhibits  a  distinct  statistical 
signature  which  can  be  used  to  draw  conclusions  about  the  character  of  a  soil 
deposit  and  about  the  quality  of  testing. 

Part  III  presents  the  tools  required  to  interpret  the  structure  of  spatial 
variation,  and  to  draw  conclusions  about  the  impact  of  spatial  variation  on 
engineering  calculations. 

Trends  and  Variations  About  Trends 

In  Part  II,  means  and  standard  deviations  were  used  to  describe  the 
variability  in  a  set  of  soil  property  data.  These  are  useful  measures,  but 
they  combine  data  in  such  a  way  that  spatial  information  is  lost.  To  describe 
the  variation  of  soil  properties  in  space,  additional  tools  are  needed. 

Consider  the  two  sequences  of  hypothetical  measurements  shown  in  Figs.  10a 
and  10b.  Presume  that  each  measurement  was  made  at  the  came  elevation,  one  in 
each  of  nine  consecutive  borings  along  a  line.  These  two  sets  of  data  have  the 
same  mean  and  standard  deviation,  but  clearly  reflect  difcerent  soil 
conditions.  The  first  data  exhibit  a  distinct  horizontal  trend,  the  second  are 
erratic.  This  difference  cannot  be  inferred  from  the  mean  and  standard 
deviation  alone,  for  they  are  the  same  in  both  cases. 

In  -inciple,  the  spatial  variation  of  a  soil  deposit  ran  lie  characterized 
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in  detail,  but  only  if  a  larqe  number  of  tests  is  made.  In  reality,  the  number 
of  tests  required  far  exceeds  that  which  would  be  practical.  Thus,  for 
engineering  purposes  a  simplification  is  introduced--that  is,  a  model--within 
which  spatial  variability  is  separated  into  two  parts:  (i)  a  known 

deterministic  trend,  and  (ii)  residual  variability  about  that  trend.  This 
model  is  written, 

=  t|  +  u^  ,  (28) 

in  which  x^  is  the  soil  property  at  location  i,  t^  is  the  value  of  the  trend  at 
i,  and  u^  is  the  residual  variation.  The  trend  is  characterized 
deterministically  by  an  equation.  The  residuals  are  character ized 
statistically,  by  a  mean,  standard  deviation,  and  somethinq  statisticians  call 
an  autocorrelation  function.  Rather  than  characterize  soil  properties  at  every 
point,  data  are  used  to  estimate  a  smooth  trend,  and  remaininq  variations  are 
described  statistically. 

The  residuals  are  characterized  statistically  because  there  are  too  few 
data  to  do  otherwise.  This  does  not  assume  soil  properties  are  random;  they 
are  not.  While  statistical  techniques  provide  a  convenient  way  to  describe 
what  is  known  about  spatial  variation,  one  has  always  to  be  wary  that  groupinq 
data  together  does  not  mask  a  real  and  crutial  "geological  detail." 

Estimating  Trends 

Trends  are  estimated  by  fitting  well-defined  mathematical  functions 
(i.e.,  lines,  curves,  or  surfaces)  to  data  points  in  space.  The  easiest  way 
to  do  this  is  by  regression  analysis  as  outline  in  Part  It.  For  example,  Fiq. 

1 1  shows  maximum  past  pressure  measurements  as  a  function  of  depth  in  a  deposit 


of  Gulf  of  Mexico  clay.  For  geological  reasons  the  increase  of  maximum  past 

pressure  with  depth  is  expected  to  be  linear  within  this  homogeneous  stratum. 

Data  from  an  overlying  dessicated  crust  are  not  shown. 

The  equation  for  the  trend  of  maximum  past  pressure  cm’  ,  with  depth  z  is 

vtn 

o'  =  a  +  bz  +  u  (29) 

vm 

in  which  a  and  b  are  regression  coefficients  (intercept  and  slope),  and  u  = 
residual  variation  about  the  trend.  Applying  Eqns.  21  and  22  to  the  data 
leads  to 


3  sf 

(30) 

0.06  ksf/ft 

(31) 

for  which  the  corresponding  trend  line  is  shown  on  Fig.  11.  For  data  analysis 
purposes,  the  regression  line  m0^  =  3  +  0.06z  is  the  best  estimate  or  mean  of 
the  maximum  past  pressure  as  a  function  of  depth. 

Residuals  About  Trends 

Residual  variation  not  accounted  for  by  the  trend  is  characterized  by  a 
standard  deviation  or  variance.  Ry  the  procedure  through  which  the  trend  is 
fit,  the  residuals  must  have  zero  mean.  The  variance  of  the  residuals  is 
calculated  by  Eqn.  23  to  be  Vu  =  Iksf-9.  This  is  the  variability  of  ovm' 
unexplained  by  the  trend  line.  Plus  and  minus  one  standard  deviation  bounds 
with  depth  are  shown  in  Fig.  11.  The  standard  deviation  su  is  the  uncertainty 
in  maximum  past  pressure  at  any  elevation.  This  is  the 

uncertainty  in  o'  at  a  point  in  the  soil  deposit  caused  by  modeling  spatial 
vm 

variation  with  a  smoothly  varying  trend,  here  the  line  of  Fqn.  20. 


Presumably , 


the  standard  deviation  of  the  residuals  is  the  same  everywhere  aloriq  the  line. 


This  is  an  assumption  of  the  least  squares  fittinq  procedure.  Usually  this 
assumption  is  good,  but  it  can  be  relaxed  if  necessary  (Johnson,  1Q60). 

Another  assumption  in  fitting  trends  is  that  residual  variations  are 
unrelated  (i.e.,  independent)  from  one  place  to  another.  In  fact,  this  is 
seldom  the  case  for  geotechnical  data.  Fig.  12a  shows  residuals  which  have 
been  artificially  generated  to  be  independent  from  one  to  another.  Fig.  12b 
shows  residuals  typical  of  most  soil  data.  Inspection  shows  a  difference  in 
character.  The  first  set  appears  'erratic;'  the  second,  'wavy.' 

The  waviness  of  residual  soil  data  reflects  spatial  structure  that  is 
ignored  in  the  regression  analysis.  If  a  measurement  at  depth  i  in  the 
profile  lies  above  the  averaqe  trend  with  depth,  as  a  general  rule 
measurements  at  adjacent  depths  also  lie  above  the  trend,  and  vice  versa. 

This  is  called  'autocorrelation. '  The  longer  the  apparent  'wave  length'  of 
the  residuals  the  farther  autocorrelation  extends. 

More  forma -ly,  autocorrelation  is  the  property  that  residuals  off  the  mean 
trend  are  not  statistically  independent,  and  that  the  degree  of  association 
among  them  as  measured  by  the  correlation  coefficient  depends  on  their  relative 
separation  in  space. 

Correlation  was  introduced  in  Part  II.  Correlation  is  the  property  that, 
on  average,  two  variables  are  associated  with  one  another.  Knowing  the  value 
of  one  provides  information  on  the  value  of  the  other.  The  strength  of  such 
association  is  measured  by  a  correlation  coefficient,  ranging  between  plus  and 
minus  one. 


variable  at  different  locations.  For  example,  Fig.  13,  shows  standard 
penetration  test  ( SPT )  blow  counts  as  a  function  of  depth.  In  the  horizontal 
direction  these  blow  counts  have  an  approximately  constant  mean,  therefore 
detrendinq  is  not  needed.  In  Fiqs.  14a, b,c  the  blow  count  data  are  plotted 
against  one  another.  The  horizontal  axis  records  the  blow  count  at  location  i 
the  vertical  axis  records  the  correspondinq  blow  count  at  a  location  separated 
by  r  from  location  i.  When  r  is  larqe  as  in  Fiq.  14c,  the  correlation  between 
u^  and  ui+r  is  slight.  However,  as  r  becomes  smaller,  as  in  Fig.  14a,  the 
correlation  increases.  As  r->-0,  naturally,  the  correlation  approaches  +1. 
Plotting  the  correlation  coefficient  so  obtained  as  a  function  of  separation 
distance  f  gives  the  autocorrelation  function,  denoted  Rx ( 6 ) .  Plotting  the 
correlation  coefficient  multiplied  by  the  data  variance  (i.e.,  the  covariance 
of  Equation  9)  gives  the  autocovar iance  function,  denoted  Cx(5).  The 
autocovariance  is  shown  in  Fig.  15a. 

The  effect  of  correlation  structure  on  residual  variation  can  be  seen  in 
Fig.  16  in  which  four  cases  are  sketched  schematically.  Spatial  variability 
about  a  trend  is  characterized  by  variance  and  autocorrelation.  Large 
variance  implies  that  the  absolute  magnitude  of  the  residuals  is  larqe;  large 
autocorrelation  implies  that  the  'wave  length1  of  variation  is  long. 

Trends  vs.  Residuals 

As  can  be  seen  from  the  preceding  section,  the  division  of  spatial 
variation  into  a  trend  and  residuals  about  the  trend  is  an  artifact  of 
analysis.  Ry  changing  the  trend  model  fit  to  data,  for  example,  by  replacing 
linear  trend  with  a  polynomial,  the  variance  and  autocorrelation  function  of 
the  residuals  can  be  changed  almost  arbitrari ly .  From  a  practical  view  the 
selection  of  a  trend  line  or  curve  is  in  effect  a  decision  on  how  much  of  the 


data  scatter  to  nodel  as  a  deterministic  function  of  space,  and  how  much  to 
treat  statistically.  nividing  spatial  variability  into  a  deterministic  part 
and  a  statistical  part  is  a  matter  of  pract icality .  Prudence  requires  that 
each  datum  be  judqed  for  what  it  might  say  about  a  soil  deposit,  but 
engineering  analysis  requires  models  of  soil  properties  for  making  predictions. 

As  a  rule  of  thumb,  trend  surfaces  should  be  kept  as  simple  as  possible 
without  doing  injustice  to  a  set  of  data  or  ignoring  the  geologic  setting.  The 
problem  with  using  trend  surfaces  that  are  very  flexible,  as  for  example  high 
order  polynomials,  is  that  the  number  of  data  from  which  the  parameters  of 
those  equations  are  estimated  is  limited.  The  more  parameter  estimates  that  a 
trend  surface  requires,  the  more  uncertainty  there  is  in  the  numerical  values 
of  those  estimates.  Uncertainty  in  regression  coefficient  estimates  increases 
rapidly  as  the  flexibility  of  the  trend  increases.  Uncertainty  in  regression 
coefficients  is  discussed  in  more  detail  in  Part  IV. 

Autocorrelation  and  Autocovariance 

This  section  presents  a  more  mathematical  treatment  of  autocorrelation  and 
autocovariance.  If  xi  =  t^  +  u^  is  a  continuous  variable  and  the  soil  deposit 
is  zonally  homogeneous,  then  at  locations  i  and  j,  which  are  close  together, 
the  residuals  uj  and  Uj  should  be  expected  to  be  similar.  That  is,  the 
variations  reflected  in  u^  and  Uj  are  associated  with  one  another.  When  the 
locations  are  close  together,  the  association  is  usually  strong.  As  the 
locations  become  more  widely  separated,  the  association  usuallv  decreases.  As 
the  separation  between  two  locations  i  and  j  approaches  zero,  Uj  and  u -j  become 
the  same,  the  association  becomes  perfect.  Conversely,  as  the  separation 
becomes  large,  u^  and  Uj  become  independent,  the  association  becomes  zero. 

This  spatial  association  of  residuals  off  the  trend  t^  is  summarized  by  a 


mathematical  function  describing  the  correlation  of  and  Uj  as  the  separation 
f  increases.  This  description  is  called  the  autocorrelation  function.  In 
concept,  the  autocorrelation  function  is  a  mathematical  way  of  summarizing  the 
correlations  shown  in  the  scatterplots  of  Figs.  14a, b,c.  Mathematically,  the 
autocorrelation  function  Rx(r)  is 

Rx(  )  =  ( - — )  £  ( — ■L] 1 — 3— -3-)  =  "autocorrelation  function"  ,  (32) 

n^-K  sx  sx 

in  which  n,$  =  the  number  of  data  pairs  having  separation  distance  6,  and  k  = 
the  number  of  coefficients  needed  to  define  the  trend  model  (e.g.,  the 
parameters  a  and  b  in  Eq.  29).  Rx ( 6 )  expresses  the  correlation  of  two 
residuals  off  the  trend  surface  as  a  function  of  their  separation  distance.  By 
definition,  the  autocorrelation  at  zero  separation  distances  is  Rx(0)=1.0. 
Empirically,  for  most  soils,  autocorrelation  decreases  monoton ically  to  zero  as 
6  increases. 

If  Rx(^)  is  multiplied  by  the  variance  of  the  residuals  Vu ,  the 
autocovariance  function  is  obtained,  as  shown  in  Fiq .  15, 

CX(S)  =  RX(S)VU  =  "autocovariance  function"  .  (33) 

The  relationship  between  the  autocorrelation  function  of  Eqn.  32  and  the 
autocovariance  function  of  Eqn.  33  is  the  same  as  that  between  the  correlation 
coefficient  of  Eqn.  8  and  the  covariance  of  Eqn.  9. 

Consider  the  site  shown  in  Fig.  17  which  overlies  an  hydraulic  hay  fill. 
SPT  data  taken  in  the  silty  fine  sand  between  elevations  +3  and  -7m  show  little 
if  any  trend  horizontally,  and  so  a  constant  trend  at  the  mean  of  the  data  is 
assumed.  Fig.  19  shows  the  histogram  of  SPT  data.  Fig.  in  shows 
a u tocova r i ance  functions  in  the  horizontal  direction  estimated  for  three 
intervals  of  elevation.  At  short  separation  distances  the  data  show  distinct 


association,  i.e.,  correlation.  At  large  separation  distances  the  data  exhibit 
essentially  no  correlation. 

In  natural  deposits,  correlations  in  the  vertical  direction  extend  to  much 
shorter  distances  than  in  the  horizontal  direction.  A  ratio  of  about  one  to 
ten  for  these  correlation  distances  is  connon.  Horizontally,  autocorrelation 
may  be  isotropic  (e.q.,  Rx  ( *)  )  in  the  north-south  direction  is  the  same  as 
Rx(l)in  the  east-west  direction)  or  anisotropic,  depending  on  geologic  history; 
however,  in  practice,  isotropy  is  often  assumed.  Also,  autocorrelation  is 
typically  assumed  to  be  the  same  everywhere  within  a  deposit.  This  assumption, 
called  sta t ionari ty ,  is  equivalent  to  assuminq  that  the  deposit  is 
statistically  homoqeneous. 

It  is  important  to  emphasize  that  the  autocorrel at  ion  function  is  an 
artifact  of  the  way  soil  variability  is  separated  between  a  'trend'  and 
'residuals.'  Since  there  is  nothing  innate  about  the  chosen  trend  tj ,  and  since 
chanqinq  the  trend  chanqes  tfle  autocorrelation  function  reflects  a 

modeling  decision.  The  influence  of  chanqinq  trends  on  CX(M  is  illustrated  in 
Figs.  20,  21  and  22,  showing  data  analyzed  by  Javette  (1933).  Fig.  21  shows 
autocorrelations  of  water  content  in  San  Francisco  Pay  Mud  within  an  i..  .erval 
of  3  ft.  Fig.  22  shows  the  same  autocorrelation  function  when  the  entire  site 
is  considered.  The  difference  comes  from  the  fact  that  in  Fig.  21  the  mean 
trend  is  taken  locally  within  the  3  ft.  interval.  In  Fig.  22  the  mean  trend  is 
taken  globally  across  the  site.  The  schematic  drawing  in  Fig.  23  suggests  why 
the  autocorrelations  should  differ. 

Autocorrelation  can  be  found  in  almost  all  spatial  data  which  are  analvzed 
using  a  model  of  the  form  of  Fan.  23.  For  example,  Fig.  24  shows  the 
autocorrelation  of  joint  (i.e.,  rock  fracture)  density  in  a  copper  porphv rv 


deposit;  Fig.  25  shows  the  autocorrelation  of  water  content  in  the  compacted 


clay  core  of  a  rock  fill  dam;  Fig.  26  shows  the  autocorrelation  of  cone 
penetration  resistance  in  North  Sea  Clay.  In  mining,  the  importance  of 
autocorrelation  to  ore  reserve  estimates  has  been  recognized  for  many  years. 

In  mining  "geostatistics"  a  complimentary  function  to  the  autocorrelation 
function,  called  the  varioqram  (Matheron,  1971),  is  more  commonly  used  to 
express  the  spatial  structure  of  data.  The  variogram  requires  a  less 
restrictive  statistical  assumption  on  stationarity  than  the  autocorrelation 
function  requires  and  is  therefore  often  preferred  for  estimation  problems.  On 
the  other  hand,  the  variogram  is  sometimes  more  difficult  to  use  in  engineering 
analyses,  and  thus  for  geotechnical  purposes  the  autocorrelation  is  more 
commonly  used.  In  practice,  the  two  ways  of  characterizing  spatial  structure 
are  quite  similar. 

Estimating  Autocovariance  and  Autocorrelation 

This  section  considers  only  a  straightforward  and  often  used  approach  to 
estimating  autocovar iance  and  autocorrelation,  the  'moment  estimate.'  For  more 
detailed  discussion  of  statistical  aspects  of  estimating  Cx ( r ) ,  including  more 
efficient  estimators,  see  Appendix  A. 

Consider  the  simple  case  of  measurements  made  at  equally  spaced  intervals 
along  a  line,  as  for  example  in  a  boring.  Presume  that  the  measurements  x  = 
rxi,...,xn‘  are  unaffected  by  measurement  error.  The  autocovariance  of  the 
measurement.;  at  separation  *  is, 


t:  (  ) 

x 


(  X  j  -  t  j  )  (  X  j  +  ■  - 1  i  t  .  ) 


S  1 


n  - . 


(34) 


(35) 


1  r* 

=  (ui>  (ui+5) 

This  autocovariance  is  called  the  'sample  autocovariance, 1  and  it  is  used  as  an 
estimator  of  the  real  autocovarianct  at  separation  distance  5.  The  real  auto¬ 
covariance  is  that  which  would  be  obtained  if  the  true  values  of  soil  proper¬ 
ties  at  every  point  in  s:>a  'e  were  known.  The  general  expression  of  the  sample 
autocovariance  for  any  arbitrary  distance  6  is, 

Cx(6)  =  -  (xi-ti)(xi+5-ti+6)  (36) 

=  l  ^iHui+6)  (37) 

in  which  n^  =  the  number  of  pairs  of  data  at  separation  distance  6 ,  and  k  =  the 
number  of  parameter  estimates  required  for  the  trend.  For  n  uniformly  spaced 
data  on  a  line  with  constant-mean  trend,  n§  =  n-6  (because  there  are  n-<5  pairs 
of  data  with  separation  distance  6)  and  k=1  (because  only  one  coefficient  is 
needed  to  define  a  constant  mean). 

In  the  general  case,  measurements  are  seldom  uniformly  spaced  and,  at 
least  in  the  horizontal  plane,  seldom  lie  on  a  line.  For  such  situations  the 
sample  autocovariance  can  still  be  used  as  an  estimator,  but  with  some 
alteration.  The  most  common  way  to  accomodate  non-unif ormly  placed 
measurements  is  by  dividing  separation  distances  into  bands,  and  then  taking 
the  averages  of  Eqn.  36  within  those  bands  (Fig.  27).  This  introduces  some 
bias  to  the  estimate  but  for  most  engineering  purposes  it  is  sufficiently 
accurate. 
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Measurement  Noise 


Random  measurement  error  is  that  part  of  data  scatter  attrinutable  to 
instrument  or  operator  induced  variations  from  one  test  to  another.  This 
variability  may  sometimes  increase  a  measurement  and  sometimes  decrease  it,  but 
its  effect  on  any  one,  specific  measurement  is  unknown.  As  a  first 
approximation,  instrument  and  operator  effects  on  measured  properties  of  soils 
can  be  represented  by  a  frequency  diagram  as  shown  schematically  in  Fig.  28. 

In  repeated  testing--presuminq  that  repeated  testing  were  possible  on  the  same 
specimen--measured  values  differ.  Sometimes  the  measurement  is  higher  than  the 
real  value  of  the  property,  sometimes  it  is  lower,  and  on  average  it  may 
systematically  differ  from  the  real  value.  The  systematic  difference  between 
the  real  value  and  the  average  of  the  measurements  is  said  to  be  measurement 
bias,  while  the  variability  of  the  measurements  about  their  mean  is  said  to  be 
random  measurement  error. 

Sources  and  Character  of  Random  Measurement  Frror  or  Noise 

Random  errors  enter  measurements  of  soil  properties  through  a  variety  of 
sources  related  to  the  personnel  and  instruments  used  in  soil  investigations 
or  laboratory  testing. 

Operator  or  personnel  errors  arise  in  many  types  of  measurements  where 
reading  scales  is  necessary,  personal  judgement  is  needed,  or  operators  affect 
the  mechanical  operation  of  a  niece  of  testing  equipment  (e.g.,  SPT  hammers). 

In  each  of  these  cases  operator  differences  have  systematic  and  random 
components.  One  person,  for  example,  may  consistently  read  a  gage  too  high, 
another  too  low.  if  required  to  make  a  series  of  replicate  measurements,  a 
single  individual  may  report  numbers  which  viry  one  from  the  other  over  the 


series.  Figure  29  shows  histograms  of  strike  and  dip  measurements  made  by  many 
people  on  the  same  rock  joint,  usinq  the  same  Brunton  compass. 

Such  variability  is  common  and  widely  recognised,  and  as  soil  testing 
moves  to  more  and  more  automated  procedures,  this  operator  variability  will 
decrease.  With  hand  operated  field  vane  devices  an  operator  may  unconsciously 
vary  the  rate  of  torque  from  one  test  to  another,  thereby  influencing  measured 
undrained  strengths.  With  an  automated  vane  such  variability  is  lessened. 
Naturally,  operators  also  sometimes  make  mistakes.  If  these  mistakes  are  small 
and  not  easily  identified  by  inspection,  they  too  become  random  measurement 
errors. 

Instrumental  error  arises  from  variations  in  the  way  tests  are  set  up, 
loads  are  delivered,  or  soil  response  is  sensed.  The  separation  of  measurement 
errors  between  operator  and  instrumental  causes  is  not  only  indistinct,  but 
also  unimportant  for  most  purposes.  In  triaxial  tests  soil  samples  may  be 
positioned  differently  with  respect  to  loading  plattens  in  succeeding  tests. 
Handling  and  trimming  may  cause  differing  amounts  of  disturbance  from  one 
specimen  to  the  next.  Piston  friction  may  vary  slightly  from  one  movement  to 
another,  or  temperature  changes  may  affect  fluids  and  solids.  The  aggregate 
result  of  all  these  variables  is  differences  between  measurements  that  are 
unrelated  to  the  soil  properties  of  interest. 

Assignable  causes  of  minor  variation  are  always  present  because  a  very 
large  number  of  variables  affect  any  measurement.  One  attempts  to  control 
those  which  have  important  effects,  but  this  leaves  uncontrolled  a  large  number 
which  individually  have  only  small  effects  on  a  measurement.  These  assignable 
causes  of  variation  if  not  identified  may  influence  the  precision  and  possibly 
the  accuracy  of  measurements  by  biasing  the  results. 


For  example,  hammer  efficiency  in  the  SPT  test  strongly  affects  measured  blow 
counts.  Efficiency  with  the  same  hammer  ..an  vary  by  50%  or  more  from  one 
blow  to  the  next  (Kavazan jian,  1983).  Hammer  efficiency  can  be  controlled, 
but  only  at  some  cost.  If  uncontrolled,  it  becomes  a  source  of  random 
measurement  error  and  increases  the  scatter  in  SPT  data. 

Models  for  Measurement  Error 

Random  measurement  errors  are  ones  whose  sign  and  magnitude  cannot  be 
predicted,  they  may  be  plus  or  minus.  Typically,  random  errors  tend  to  be 
small  and  they  tend  to  distribute  themselves  equally  on  both  sides  of  zero. 
Measurement  error  is  the  cumulative  effect  of  an  indefinite  number  of  small 
'elementary'  errors  simultaneously  affecting  a  measurement. 

The  common  model  of  measurement  error  is, 

z  =  x  +  e  ,  (38) 

in  which  z  is  the  measurement,  x  is  the  soil  property  being  measured,  and  e  is 
a  random  error  of  zero  mean.  Were  systematic  errors  present,  the  mean  of  e 
would  differ  from  zero. 

An  important  property  of  e  in  Eqn.  38  is  that  it  is  assumed  statistically 
independent  from  one  measurement  to  another  and  to  have  the  same  mean  (i.e.,  0) 
and  variance  Ve  for  each  measurement.  The  value  e  takes  or  at  one  measurement 
is  assumed  to  be  unrelated  to  the  value  it  takes  on  at  any  other.  This  has 
important  practical  implications;  for  example,  it  means  th^ t  if  many 
measurements  are  averaged  together  to  estimate  a  property,  measurement  noise 
averages  out. 

Random  measurement  error  can  he  estimated  in  a  variety  of  ways,  some 
direct  and  some  indirect.  As  a  general  rule,  the  direct  techniques  are 
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difficult  to  apply  to  the  soil  measurements  of  interest  to  geotechnical 
engineers.  Nevertheless,  direct  techniques  provide  insight  into  the  nature  of 
random  errors.  Indirect  methods,  on  the  other  hand,  are  generally  more 
practical. 

Direct  Estimation  of  Measurement  Noise 

The  traditional  way  of  estimating  random  measurement  error  is  by 
replicate  testing.  The  same  property  is  measured  repeatedly  on  the  same 
specimen  and  the  results  compared.  An  example  was  shown  in  Fig.  29  with 
replicate  measurements  of  joint  strike  and  dip.  Presumably,  the  property 
being  measured  does  not  change  from  test  to  test,  so  the  variability  observed 
in  test  results  comes  from  random  errors. 

Replicate  testing  is  a  simple,  direct,  and  accurate  way  of  establishing 
random  measurement  error.  Vn fortunately ,  it  is  seldom  of  use  because  the 
properties  engineers  are  most  interested  in  are  measured  destructively. 
Performing  the  same  test  on  different  specimens,  no  matter  how  closely  together 
they  were  sampled  in  the  field,  always  leaves  unanswered  how  much  of  the 
variability  is  due  to  measurement  and  how  much  to  real  differences  in  the 
soil. 


Indirect  Estimation  of  Measurement  Noise 

Indirect  methods  for  estimating  Ve  usually  involve  correlations  of  the 
property  in  question  either  with  other  properties  such  as  index  tests,  or  with 
itself  through  the  autocorrelation  function.  The  easiest  and  most  powerful 
methods  involve  the  autocorrelation  function.  Combining  Eqns.  28  and  38, 
data  are  represented  as 
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The  autocovariance  of  z  in  Eqn.  39,  after  the  trend  has  been  removed,  becomes 


c2 ( <5  )  =  Cx(6  )  +  Ce ( 6  )  (40) 

in  which  Cx(<5)  is  from  Eqn.  33,  and  Ce  ( 5  )  is  the  autocovariance  function  of  e. 
Eqn.  40  can  be  verified  by  substituting  z^  for  x^-tj.  in  Eqn.  32  and 
algebraically  rearranging.  Since  ej^  and  ej  are  independent  except  for  i= j ,  the 
autocovariance  function  of  e  is  a  spike  at  S=0  and  zero  elsewhere.  Thus,  Cz(6) 

is  composed  of  two  functions  as  shown  in  Fig.  30.  By  extrapolating  the 

observed  autocovariance  function  to  the  origin,  an  estimate  is  obtained  of  the 
fraction  of  data  scatter  that  comes  from  random  error.  For  the  data  of  Fig. 

31,  Ve  0.5VZ.  In  the  "geostatistics"  literature  this  is  called  the  nugget 
effect. 

For  the  field  vane  data  of  Fig.  32,  the  random  measurement  error 

contribution  to  data  scatter  is  about  20  kPa2,  or  40%  of  the  variance.  Fig. 

33a  shows  the  horizontal  autocovariance  function  of  the  data  in  a  Fig.  32a. 

Fig.  33b  shows  the  vertical  autocovariance  function.  These  data  are  analyzed 
by  a  different  and  more  powerful  procedure  in  Appendix  A  to  yield  approximately 
the  same  estimate.  Fig.  34  shows  the  vertical  autocorrelation  of  cone 
penetration  resistance  data  in  a  copper  porphyry  tailings  embankment.  Here  the 
measurement  error  is  very  small. 

The  importance  of  random  measurement  errors  is  well  illustrated  by  a  case 
involving  a  large  number  of  shallow  footings  placed  on  approximately  ten  meters 
of  uniform  sand.  The  site  was  characterized  by  Standard  Penetration  blow  count 
measurements,  predictions  were  made  of  settlement,  and  settlements  were 
subsequently  measured  (Hilldalo-Cunninqham,  1971). 
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Inspection  of  the  PPT  data  and  subsequent  settlements  reveals  an 
interesting  discrepancy.  Since  footing  settlements  on  sand  tend  to  be 
proportional  to  the  inverse  of  average  blow  count  beneath  the  footing,  it  would 
be  expected  from  Eqn.  19  that  the  coefficient  of  variation  of  the  settlements 
equaled  approximately  that  of  the  vertically  averaged  blow  counts. 
Mathematically,  settlement  is  predicted  by  a  formula  of  the  form, 


0  11 


Aq 

N 


q  (b  ) 


(4  1  ) 


in  which  p  =  se 1 1 leme n t ,  Aq=net  applied  stress  at  the  base  of  the  footing, 
N=average  corrected  blow  count,  and  g(b)=a  function  of  footing  width  (see, 

Lambe  and  Whitman,  1969).  Therefore,  by  Eqn.  19  the  coefficient  of  variation 
of  p  should  be, 

^  =  9,n  .  (42) 

In  fact,  the  coefficient  of  variation  of  the  vertically  averaqed  blow  counts  is 
about  Qn=0.44.  The  observed  values  of  total  settlements  for  268  footings  have 
mean  0.35  inches  and  standard  deviation  0.12  inches;  so,  -0p=  (0 . 1  2/0 . 35  )  =0 . 34  . 
Why  the  difference? 

The  explanation  is  found  in  estimates  of  the  measurement  noise  in  the  blow 
count  data.  Plate  3  shows  the  horizontal  autocorrelation  function  for  the  blow 
count  data.  By  extrapolating  this  function  to  the  origin,  the  noise  (or  high 
frequency)  content  of  the  data  is  estimated  to  be  about  50%  of  the  data  scatter 
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va  riance . 


This  means  that, 


=  (0.35) 


which  is  close  to  the  observed  variability  of  the  settlements.  Measurement 
noise  of  50%  or  even  more  of  the  observed  scatter  of  in  situ  test  data, 
particularly  the  SPT ,  has  been  noted  on  several  projects  (e.g.,  Baecher,  Marr, 
Lin,  and  Consla,  1980;  Schmertmann,  personal  communication,  1986). 

In  fact,  while  random  measurement  error  exhibits  itself  in  the 
autocorrelation  or  autocovariance  function  as  a  spike  at  r=0 ,  real  variability 
of  the  soil  at  a  scale  smaller  than  the  minimum  boring  spacing  cannot  be 
distinguished  from  measurement  error  when  using  the  extrapolation  technique. 
Thus,  it  need  not  be  that  the  'noise'  component  estimated  from  the  horizontal 
autocovariance  function  in  the  horizontal  direction  is  the  same  as  that 
estimated  from  the  vertical. 

For  many,  but  not  all,  applications  the  distinction  between  measurement 
error  and  small  scale  variability  is  unimportant.  For  any  engineering  appli¬ 
cation  in  which  average  properties  within  some  volume  of  soil  are  important, 
the  small  scale  variability  averages  quickly  and  therefore  has  little  effect  on 
predicted  performance.  Thus,  for  practical  purposes  it  can  be  treated  as  if  it 
were  a  measurement  error.  On  the  other  hand,  if  performance  depends  on  extreme 
propert ies--n o  matter  their  geometric  scale--this  unimportance  no  longer 
obtains.  Some  engineers  think  that  piping  (internal  erosion)  in  dams  is  such  a 
mode  of  pe r f ornance .  However,  few  physical  mechanisms  of  performance  easily 
come  to  mind  which  are  strongly  affected  by  small  scale  spatial  variabilities, 
unless  those  anomalous  features  are  continuous  over  a  large  extent  in  at  least 
one  dimension. 


Rejecting  Outlier  Data 


It  is  often  the  case  with  geotechnical  measurements  that  one  or  more  data 
differ  strikingly  from  the  bulk  of  the  measurements  made.  This  presents  the 
often  difficult  question  of  whether  to  reject  the  data  as  anomalous,  or  to 
decide  that  they  reflect  real  and  important  variations  in  soil  or  rock  mass 
properties.  The  decision  that  data  are  anomalous  could  mean  one  of  at  least 
two  things,  (a)  that  they  are  thought  erroneous,  or  (b)  that  they  are  thought 
to  be  real  but  unimportant. 

The  profile  of  Fig.  35  shows  SPT  blow  count  data  with  depth  in  a  silty 
sand  deposit.  Near  elevation  73  in  boring  SS-56-66  one  of  the  measurements 
appears  very  high  (N=12bpf),  at  least  compared  to  the  apparent  trend  of  the  FV 
strengths  with  depth.  It  is  certainly  the  case  that  this  high  value  may 
reflect  local  variation  in  soil  properties  or  may  reflect  an  interstratif ied 
layer  of  much  stronger  material.  However,  given  that  the  high  value  does  not 
appear  in  the  nearby  borings,  the  likelihood  of  this  high  value  reflecting  real 
and  important  variation  in  sand  strength  seems  improbable.  More  likely,  the 
high  value  has  been  caused  by  a  rock  fragment  or  small  sand  lens,  or  possibly 
by  an  error.  A  decision  must  be  made  either  to  treat  the  measurement  as  part 
of  the  data  set  and  to  include  it  in  the  statistical  analysis,  or  to  reject  it 
and  remove  it  from  the  analysis. 

In  principle,  it  may  be  possible  to  retrace  steps  through  the 
documentation  of  a  testing  program  to  see  whether  an  explanataion  for  the 
unusually  large  measurement  can  be  found.  Yet,  unless  an  extraordinary  quality 
assurance  program  has  been  followed,  this  tracing  often  leads  to  no  clear 
answer.  In  such  case,  the  decision  to  accept  or  reject  the  measurement  has  to 
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be  made  on  the  basis  of  the  set  of  data  alone,  and  on  the  relation  of  a 
particular  measurement  to  the  general  characteristics  of  the  bulk  of  the  data. 


The  decision  is  ultimately  subjective.  By  throwing  out  an  'outlier'  one  may  in 
fact  be  throwing  out  the  most  interesting  piece  of  information  about  the  soil 
or  rock  formation. 

In  judging  whether  an  extreme  value  is  real  and  important  or  simply  an 
outlier  which  can  be  rejected,  statisticians  prefer  to  follow  some  formal 
policy  whereby  the  probability  of  accepting  or  rejecting  the  measurement 
erroneously  can  be  calculated.  A  suitable  value  for  this  probability  is 
decided  upon,  and  then  an  explicit  rule  for  rejecting  data  is  derived. 

To  decide  whether  to  accept  or  reject  an  extreme  individual  measurement,  a 
common  procedure  is  to  use  the  quantity 

z  .  -  m 

t  =  — - -  (44) 

s 

z 

in  which  mx  and  sx  are  the  mean  and  standard  deviation  of  the  set  of 
measurements  z-|,...,zn,  which  includes  the  suspect  value  Zj_.  If  the  z  are 
Normally  distributed,  the  quantity  t  should  have  a  student's  t  distributional 
form  with  v=n-1  degrees-of-f reedom.  Thus,  the  probability  of  an  individual 
measurement  deviating  as  much  from  the  mean  as  z^  does  can  be  evaluated  from 
tables  of  the  Student's  t  distribution  (e.g.,  Benjamin  and  Cornell,  1970). 

Some  critical  probability  level  a  is  chosen,  usually  ct=0.05  or  a=0.01,  and  if 
the  probability  of  a  deviation  at  least  as  large  as  observed  with  z^  is  less 
than  a,  the  measurement  is  rejected.  Using  this  rule,  the  probability  of 
rejecting  a  measurement  z^  which  truly  is  appropriately  part  of  the  data  set  is 

a  . 

01 


Considering  the  outlying  measurement  in  Fig.  35,  the  test  value  t  equals. 


1  2bpf  -  3 . 8bpf 
2. 56bpf 


(45) 


in  which  m=3.8bpf  and  s=2.56bpf.  Comparing  this  value  to  tables  of  the 
Student's  t  for  n=35  data  (  =n-1=34  degrees  of  freedom),  the  probability  of  a 
deviation  as  large  as  observed  is  about  0.001.  Since  this  value  is  smaller 
than  either  common  criterion  =0.05  or  =0.01,  the  measurement  is  rejected  from 
the  data  set.  While  the  outlier  test  based  on  Eqn.  44  is  exact  only  for  data 
that  are  Normally  distributed,  it  remains  approximately  correct  as  long  as  the 
data  are  not  highly  skewed.  Therefore,  for  geotechnical  applications  it  is 
usually  satisfactory. 

A  shortcut  outlier  test,  that  does  not  require  computing  the  mean  and 
standard  deviation  uses  the  test  value, 


z 

n 


z 


n- 1 


(46) 


in  which  the  measurements  ,  Z2 ,  zn  are  listed  in  ascending  order.  The 

quantity  (zn-zn_i)  is  the  interval  separating  the  largest  from  the  second 
largest  measurement,  and  (zn  ~zy  )  is  the  range  of  the  data.  Dixon  (1953)  has 
worked  out  values  of  corresponding  to  probabilities  =0.05  and  =0.01  (Table 
2).  For  a  specific  outlier  to  be  tested,  the  value  is  computed  and  compared 
to  the  tabulated  value  for  the  chosen  level. 

The  test  using  ,  however,  only  works  well  with  small  n  (e.g.,  <10).  Were 
we  to  compare  the  12  bpf  measurement  only  with  other  measurements  in  the  upper 
stratum  of  the  same  boring,  of  which  there  are  4,  then, 


z  - 
n 


0.77 


(47) 


z  -  z 
n  n-1 


12-5 

12-3 


which  is  slightly  greater  than  the  critical  value  for  =0.05  and  therefore  the 
measurement  is  again  rejected  from  the  data  set. 

The  critical  values  of  Table  2  also  apply  to  the  test  of  outlier  values 
to  the  low  end  of  the  data  set,  using  the  test  value 


! 


Z2  ~  Z1 
Zn"  Z1 


in  which  z2  is  the  second  lowest  measured  value.  As  with  the  t-test-value  of 
Eqn.  44,  the  r-value  assumes  the  data  set  to  be  Normally  distributed. 


A  problem  when  evaluating  outliers  on  the  low  and  presumablly 
unconservative  side  of  a  data  set  is  that  the  risk  associated  with  incorrectly 


rejecting  an  anomalous  measurement  must  be  carefully  considered.  The  decision 
to  include  or  reject  such  a  measurement  often  rests  more  on  geological 
judgement  than  on  engineering  analysis. 


Size  Effect  Factor 

The  volume  of  soil  influenced  by  an  in  situ  test,  or  contained  in  a 
laboratory  specimen,  is  small  compared  with  that  influenced  by  a  prototype 
structure.  To  make  predictions  of  how  the  prototype  will  perform,  one  needs 
to  estimate  the  properties  within  this  larger,  representative  volume  of  soil, 
and  the  variability  among  such  representative  volumes. 

This  is  done  by  assuming  the  representative  volume  to  be  composed  of  a 
large  number  of  small  elements,  for  example,  each  the  size  of  a  test  specimen. 
The  mean  and  standard  deviation  of  the  properties  of  small  elements  are 


evaluated,  and  then  the  spatial  structure  described  by  the  autocorrelation 
function  is  used  to  calculate  corresponding  means  and  standard  deviations  for 
the  larger  volumes.  These  calculations  are  summarized  in  a  size-effect  factor, 
R,  which  in  many  cases  can  be  expressed  by  simple  formulas  or  car.  be  graphed. 

Spatial  Averaging 

Empirically,  the  variability  of  soil  properties  among  small  volumes  of 
soil,  say  test  specimens,  is  larger  than  that  among  large  volumes,  say  the  soil 
under  a  footing.  Within  a  small  volume,  physical  properties  tend  to  be  more  or 
less  the  same  throughout.  Some  individual  specimens  may  have  greater  than 
average  properties  throughout  while  some  may  have  less  than  average,  but  within 
each  specimen  there  is  less  variability  than  there  is  among  the  average 
properties  of  different  specimens.  Within  large  volumes  the  opposite  is  true, 
there  tends  to  be  a  mixture  of  high  and  low  properties  in  any  one  volume. 

Thus,  with  small  volumes  the  properties  of  individual  volumes  may  vary  sharply 
from  the  mean  across  the  site,  but  with  large  volumes  internal  variations 
balance  out  such  that  the  average  property  from  one  large  volume  to  another 
differs  very  little.  The  mean  of  large  volumes  remains  the  same  as  the  mean  of 
small  volumes,  but  the  standard  deviation  of  the  average  property  from  one 
large  volume  to  the  next  is  smaller  than  the  standard  deviation  of  the  average 
property  from  one  small  volume  to  the  next. 

The  extent  of  averaging  of  properties  within  a  large  volume  of  soil 
depends  on  the  structure  of  the  spatial  variation.  More  precisely,  the  extent 
of  averaging  depends  on  the  standard  deviation  of  properties  from  point  to 
point  and  on  the  autocorrelation  function. 
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The  influence  of  spatial  averaging  on  the  variability  among  average 


element  properties  can  be  illustrated  by  vertically  averaging  SPT  blow  counts 


in  boring  logs.  Plate  4  shows  a  set  of  six  SPT  boring  logs.  First,  one  N 


value  from  each  boring  is  randomly  chosen  and  the  mean  and  standard  deviation 


of  the  6  values  are  calculated.  The  mean  is  3.3bpf  and  the  standard  deviation 


is  2.5bpf.  The  mean  is  about  the  same  as  before,  but  the  standard  deviation 


has  gone  down.  Continuing,  the  greater  the  number  of  N-values  for  each  boring 


included  in  the  average,  the  smaller  the  standard  deviation  of  the  6  boring 


averages.  The  decrease  of  the  standard  deviation  of  average  blow  count  as  the 


number  of  N  values  included  in  each  average  increases  is  a  manifestation  of 


spatial  averaging.  The  larger  the  volume  of  soil  (i.e.,  the  greater  the  number 


of  values  in  each  average)  the  more  the  individual  fluctuations  balance  out. 


The  same  thing  happens  in  averaging  soil  properties  within  a  continuous 


block  of  soil.  The  soil  properties  fluctuate  somewhat  from  point  to  point,  so 


the  larger  the  block  of  soil  over  which  the  properties  are  averaged,  the  more 


the  high  and  low  fluctuations  cancel  out.  The  extent  of  spatial  averaging  can 


be  measured  by  calculating  the  standard  deviation  among  block  averages.  The 


more  averaging  that  goes  on  within  a  block,  the  less  variability  there  is  from 


one  block  average  to  another. 


For  this  simple  case  of  averaging  individual  blow  count  measurements,  the 


rate  of  decrease  of  the  standard  deviation  as  the  number  of  data  averaged  in 


each  boring  increases  can  be  approximately  calculated.  Fron  Part  IV,  the 


standard  deviation  of  the  boring  averages  ought  to  decrease  by  1/  k  as  the 


number  of  'I  values  in  each  boring,  k,  increases,  assuming  that  the  blow  counts 


are  mutually  independent  (i.e.,  the  correlation  coefficient  for  each  pair  is 
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zero).  If  the  blow  counts  are  not  independent,  that  is,  they  are 
autocorrelated,  the  standard  deviation  should  decrease  less  quickly  than  1//k. 
The  data  show  'wavy'  variations  about  their  spatial  mean,  and  therefore  the 
balancing  out  of  spatial  variations  takes  place  more  slowly. 

This  decrease  in  the  standard  deviation  of  soil  properties  averaged  over 
a  volume  of  soil  is  summarized  by  a  size  effect  factor,  R.  For  the  averaging 
case,  R  is  defined  as  the  ratio  of  the  variance  of  the  average  soil  property 
within  a  large  volume  of  soil  to  the  variance  among  test-sized  volumes, 

R  =  vit/vN  •  <49> 

in  which  Vm  is  the  variance  of  the  average  or  mean  property  among  elements. 

The  ratio  of  variances  rather  than  standard  deviations  is  used  because  it  is 
more  convenient  for  subsequent  error  analysis  calculations. 

The  rate  at  which  R  decreases  with  increasing  soil  volume  depends  on  how 
erratic  the  spatial  variations  are  within  a  soil  element.  The  more  erratic 
they  are,  that  is,  the  shorter  their  'wave  length,'  the  more  averaging  that 
takes  place  within  a  given  volume  of  soil.  That  is,  the  extent  of  averaging  as 
reflected  in  R  depends  on  the  autocorrelation  function  of  the  soil  properties. 

The  simplest  (hypothetical)  case  occurs  when  a  block  of  soil  is  thought  of 
as  composed  of  k  smaller  elements,  each  one  of  which  has  internally  uniform 
soil  propertise  which  are  statistically  independent  of  the  properties  of  the 
other  k-1  elements.  Let  the  mean  of  the  individual  element  properties  be  mx 
and  their  standard  deviation  be  sx.  In  this  case  the  average  property  within 
the  block  is, 

mB  =  (1/k)  I  xj  (50) 
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and  the  standard  deviation  of  the  average  among  blocks  is  calculated  by  Eqn. 
16  as , 


3mB  -  /  E  t  j2  Vxi 

dx^ 


So,  as  the  number  of  elements  In  the  block  k  goes  up,  the  standard  deviation 
among  block  averages  goes  down  as  1/Vk.  This  is  approximately  what  happens  in 
Plate  4,  in  which  the  SPT  values  show  little  vertical  autocorrelation. 

In  any  practical  case,  the  soil  block  is  not  divided  into  discrete 


elements  but  is  a  continuum.  The  'waviness'  of  soil  property  variations  within 
the  continuum  is  described  by  the  autocorrelation  function.  Knowing  the 
autocorrelation  function,  the  exact  shape  of  the  relation  of  R  to  soil  volume 


can  be  calculated  in  much  the  same  way  Eqn.  49  was  calculated. 

The  size  effect  factor  R  for  spatial  averaging  of  soil  properties  along  a 
line  is  shown  in  Fig.  36.  Three  common  mathematical  expressions  are  often  used 
to  model  the  decay  of  autocorrelation  with  separation  distance,  that  is,  the 
autocorrelation  function:  the  exponential-squared,  exponential,  and  power 
curve.  These  expressions  are  chosen  as  typical  of  the  models  used  to 
analytically  summarize  autocorrelation.  The  size  effect  factor  R  differs  among 
the  three  models  for  short  lengths  of  averaging,  but  approaches  an  asymptotic 
value , 


as  I.  becones  large.  The  parameter  0  is  the  autocorrelation  distance,  the 
distance  at  which  autocorrelation  decays  to  1/'e,  in  which  e  is  the  base  of  the 
natural  logarithms. 

Fin.  37  shows  the  size  effect  factor  R  for  spatial  averaging  over  a  two 
dimensional  square.  Fig.  38  shows  R  for  spatial  averaging  within  a  three 
dimensional  cube.  Roth  Figs.  37  and  38  are  based  on  isotropic 
autocorrelation. 

For  spatial  averaging  of  soil  properties  over  other  shaped  surfaces, 
within  other  shaped  volumes,  or  for  other  autocorrelation  functions  (e.g., 
anisotropic  autocorrelation),  the  size  effect  factor  R  can  be  easily  calculated 
usinq  numerical  simulation.  This  requires  a  programmable  calculator  or  a  small 
computer,  but  is  simple.  The  procedure  for  calculating  R  for  arbitrary 
geometries  or  arbitrary  autocorrelation  functions  is  the  following: 

1 .  Specify  an  analytical  expression  for  the  autocorrelation  function  in 
the  desired  number  of  dimensions. 

2.  Using  a  random  number  generator,  randomly  choose  two  points  within 
the  geometric  volume  to  be  averaged  over. 

3.  Calculate  the  correlation  between  the  soil  properties  at  these  two 
points  from  the  autocorrelation  function. 

4.  Repeat  this  process  many  times,  at  least  100. 

5.  Sum  the  correlations  obtained  in  the  simulations  and  divide  by  the 
number  of  simulations  (find  average  correlation  coefficient).  This 
is  an  estimate  of  the  size  effect  factor  R. 

6.  The  numerical  precision  of  R  calculated  by  simulation  has  a  standard 
deviation  equal  to  the  standard  deviation  of  the  simulated 
correlations  divided  by  the  square  root  of  the  number  of  repetitions. 


The  importance  of  spatial  variability  on  calculated  predictions  depends 
not  only  on  the  volume  of  soil  influenced  but  also  on  the  mode  of  performance. 
For  modes  of  performance  which  depend  on  average  soil  properties,  spatial 
variability  partially  averages  out,  as  escribed  above.  However,  for  modes 
which  depend  on  worst  condtions,  for  exaify  e  sliding  along  a  discontinuity  or 
internal  erosion  in  a  dam,  spatial  variability  is  accentuated.  In  this  latter 
case  the  size-effect  factor  may  be  greater  than  one,  and  an  alteration  may  be 
caused  to  the  mean.  These  cases  are  outside  the  scope  of  the  present  report. 


Table  2 


Frequency  Distribution  of  Test  Statistic  for  Outliers  Based 
on  Range  (after  Dixon,  1953) . 


Critical  values  of  the  test  value 


n- 1 


-  z, 


Sample  Size 

Critical 

Values 

n 

=0.05 

=0.01 

3 

0.941 

0.988 

4 

0.765 

0.889 

5 

0.642 

0.780 
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0.698 
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0.637 

Abstracted  from  Dixon  (1983) 


PLATE  3 


SUBJECT:  Analysis  of  Noise  in  SPT  Blow  Count  Data 


Site  Conditions 

The  site  is  underlain  by  fine  dry  sand  to  a  depth  of  10m.  Fifty  SPT 
borings  were  made  across  the  site  and  a  limited  number  of  laboratory 
tests  were  run  to  correlate  blow  count  with  friction  angle.  TTie 
trend  of  depth-averaged  blow  counts  corrected  by  Gibbs  and  Holtz's 
method  is  shown  below.  The  mean  of  the  depth  averaged  SPT  blow 
counts  in  the  upper  levels  is  25bpf;  the  standard  deviation  is 
15.5bpf.  Laboratory  tests  on  specimens  recorapacted  to  the  in  situ 
relative  density  led  to  an  average  friction  angle  of  36.4°,  and  a 
standard  deviation  of  1.1°. 
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PLATE  4 


SUBJECT:  Spatial  Averaging  of  SPT  Blow  Count  Data 
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Figure  10  -  Spatial  Data  Displaying:  (a)  Trend  With  Location 
(b)  Erratic  Variation  With  Location. 
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Figure  19.  Horizontal  iutocova r iance  Function  of  SPT  Data  in  Figures  17 
and  13. 


(Lag  distance  =  25') 


Figure  22.  Autocorrelation  Function  of  Water  Content  Over  Large  Interval 
of  San  Francisco  Bay  Mud  (after  Javette,  1983).  Javette  uses 
the  symbol  r,  for  autocorrelation,  and  expresses  distance  in 
"lags"  (i.e.,  steps),  here  of  25  ft. 
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figure  29.  Histogram  of  Strike  and 

Operators  on  the  Same  Rook  Joint. 


ACTUAL  PROPERTIES 


SEPARATION  SEPARATION 


Autocovariance  Function  is  Composed  of  Spatial  Variability 
and  Random  Measurement  Noise  Signatures. 
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Figure  31b.  Autocorre Lation  of  SPT  Data  in  Fiqure  31a 
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Figure  34. 


Vertical  Autocovariance  Function  of  Cone  Penetration  Data  in 
a  Copper  Porphyry  Tailings  Deposit. 


PART  IV:  SYSTEMATIC  ERRORS 


Thus  far  the  analysis  of  uncertainties  has  concentrated  on  data  scatter. 

It  was  seen  that  data  scatter  uncertainties  manifest  as  variability  across  a 
site,  for  example,  variability  of  settlement  from  one  footing  to  another. 
Another  type  of  uncertainty  is  also  important:  systematic  error. 

Uncertainties  due  to  systematic  errors  do  not  manifest  as  variability  across 
the  site,  but  appear  as  a  difference  between  the  predicted  average  performance 
and  the  average  performance  that  occurs  in  the  field.  Systematic  errors  are 
biases.  Usually  they  occur  because  errors  are  introduced  in  estimating  mean 
values  of  soil  properties,  loads,  or  other  input  variables. 

it 

r 

Sources  and  Importance  of  Systematic  Error 

The  most  important  sources  of  systematic  error  in  soil  property  estimates 

are  measurement  bias  and  statistical  error.  Measurement  bias  is  caused  by 

inadequacies  in  the  way  soil  test  results  are  obtained  or  interpreted.  For 

>  example,  the  stress  system  imposed  on  a  soil  specimen  during  testing  often 

differs  from  that  encountered  in  a  prototype  situation.  To  the  extent  that 

strengths  or  other  properties  are  affected  by  this  difference  in  stress  system, 

<  values  calculated  from  test  results  will  be  inappropriate  for  predictions  of 

i 

r 

•  prototype  performance. 

i 

t  Statistical  errors  are  due  to  limited  numbers  of  tests.  Because  no  two 

test  results  are  ever  the  same,  variations  from  one  set  of  results  to  another 
cause  variations  from  one  sample  mean  or  sample  standard  deviation  to  another. 
These  variations  go  down  as  the  number  of  measurements  in  a  sample  goes  up,  but 
they  are  always  present.  A  sample  statistic  such  as  the  mean  or  standard 
deviation  always  varies  somewhat  fron  the  correspondin'!  actual  value  across  a 


1  0 


soil  deposit. 


The  importance  of  drawing  a  distinction  between  data  scatter 
uncertainties  and  systematic  errors  is  that  the  two  affect  predictions  in 
different  ways.  For  example,  spatial  variation  affects  the  fraction  of  a 
large  project,  e.g.,  a  long  embankment,  that  might  perform  adversely.  If 
spatial  variation  indicates  a  10%  likelihood  of  adverse  performance,  this  means 
that  problems  should  be  expected  with  10%  of  the  embankment.  On  the  other 
hand,  systematic  error  affects  the  likelihood  that  the  entire  project  performs 
adversely.  If  systematic  error  indicates  a  10%  likelihood  of  adverse 
performance  this  means  that  problems  with  the  whole  embankment  should  be 
expected  in  one  out  of  10  projects.  The  distinction  between  data  scatter  and 
systematic  error  is  important. 

A  second  difference  between  spatial  variation  and  systematic  error  lies  in 
the  way  they  are  affected  by  scale.  If  a  very  large  volume  of  soil  is 
considered  the  uncertainty  in  averaqe  soil  conditions  may  not  be  greatly 
affected  by  spatial  variation.  Above  average  elements  of  soil  balance  against 
below  average  elements.  This  averaging  does  not  affect  systematic  errors. 

They  are  the  same  everywhere. 

It  is  often  convenient  to  think  of  spatial  variation  as  the  uncertainty  in 
soil  properties  caused  by  variations  from  spot  to  spot  in  a  soil  deposit. 
Systematic  errors  are  uncertainties  about  the  value  of  the  mean  or  trend  in 
soil  properties. 

Measurement  and  Model  Bias 

In  testing  soils,  whether  in  the  field  or  laboratory,  a  system  of 
boundary  conditions  is  applied  to  a  specimen  and  response  is  measured.  From 
this  response  and  a  set  of  physical  assumptions  (i.e.,  a  model),  soil 


properties  are  calculated.  These  properties  are  used  with  another  model  to 
predict  performance.  Non-random  errors  are  introduced  to  this  process  at 
several  points,  and  it  is  these  which  give  rise  to  measurement  bias.  These 
non-random  errors  have  systematic  and  variable  parts.  The  systematic  part  is 
said  to  be  a  measurement  bias.  The  zero-mean  variable  part  is  lumped  with 
measurement  noise  and  therefore  can  be  treated  as  a  random  error.  As  a 
result,  bias  does  not  appear  in  the  data  scatter,  it  is  a  purely  systematic 
error.  Fig.  28  illustrates  the  distinction  between  systematic  and  random 
errors  in  measurements. 


Causes  of  Measurement  and  Model  Bias 


Among  the  more  common  measurement  errors  in  soil  properties  are  (a) 
inappropriate  boundary  conditions,  (b)  inappropriate  model  assumptions,  and 
(c)  sample  disturbance.  In  most  cases  there  is  little  reason  to  separate 
measurement  bias  from  model  uncertainty.  First,  measurements  and  models  are 
often  inseparable,  and  second,  the  best  way  to  assess  measurement  bias  is  to 
backcalculate  'correct'  parameters  by  modeling  observed  failures--thereby 
combining  the  effects  of  errors  of  measurement  and  errors  of  modeling. 


Assessing  Magnitude  of  Bias 

The  direct  way  to  establish  measurement  bias  is  by  comparing 
predicted  and  observed  performance.  For  field  vane  strengths  Rjerrum  (1972) 
compared  observed  slope  performance  with  predictions  based  on  modified  Bishop 
analysis  and  backca lculated  the  correction  factor, 


c,,  for  F=  1  at  failure 
c..  measured  with  FV 


in  which  cu  =  undrained  strength  and  F  =  factor  of  safety.  The  correction 
factor  reconciles  observed  failures  with  predictions  (Fig.  39).  This  bias 
factor  combines  measurement  technique  and  prediction  model  and  is  no  longer 
appropriate  if  a  stability  model  based  on  other  assumptions  is  used  (e.g.,  a  3D 
model) . 

Introducing  a  measurement/model  bias  B  into  Eqn.  38  leads  to  the 
statistical  model, 

z  =  B  x  +  e  ,  (54) 

and  the  summation  of  variances, 

vz  =  «nB  Vx  +  mx  VB  +  Ve  ,  (55) 

in  which  V0  is  the  uncertainty  in  the  value  of  the  bias  correction  B,  and 
all  parameters  are  valued  at  their  means.  In  the  special  case  where  field  vane 
measurements  were  used  as  input  to  modified  Bishop  analysis,  B=(1/  ). 

Statistical  Error 


Because  a  limited  number  of  measurements  are  made  at  any  depth,  about  40 
in  Fig.  32,  their  average  may  be  above  or  below  the  actual  spatial  average 
even  if  there  were  no  measurement  bias.  If  another  set  of  40  borings  had  been 
made  at  slightly  different  locations,  the  exact  test  results  would  have  been 
slightly  different  from  those  obtained  here,  and  a  slightly  different  estimate 
of  the  average,  standard  deviation,  and  other  parameters  would  have  resulted. 
Thus,  the  average  vane  strength  at  any  depth  as  shown  in  Fig.  40  probably 


differs  somewhat  from  the  (actual)  spatial  average  that  would  be  obtained  from 
a  very  large  number  of  measurements.  That  is,  the  estimate  of  the  average  is 
somewhat  in  error.  To  the  extent  the  estimate  is  in  error,  this  error  is  the 
same  everywhere  along  the  axis.  It  is  a  systematic  error. 

Statistical  theory  allows  an  assessment  to  be  made  of  the  probable 
magnitude  of  error  that  results  from  limited  numbers  of  observations.  One 
never  knows,  before  hand,  the  exact  magnitude  or  direction  of  this  statistical 
error,  but  the  likely  range  of  magnitudes  can  be  calculated.  Typically, 
statistical  error  is  expressed  as  a  variance  or  standard  deviation  on  the 
estimated  parameter.  For  example,  the  statistical  error  on  the  estimate  of  the 
average  field  vane  strength  at  any  depth  in  Fig.  40  would  be  expressed 
as  a  variance  on  the  estimated  average,  vmpy >  in  which  m^  is  the  estimate 
of  the  mean  FV  strength.  The  corresponding  standard  deviation  of  the  estimate 
is  said  to  be  the  standard  error. 

The  larger  the  number  of  measurements  at  any  depth,  the  lower  one  might 
expect  the  statistical  error  to  be.  In  general,  the  variance  of  the  statis¬ 
tical  error  decreases  approximately  in  proportion  to  the  reciprocal  of  the 
number  of  observations,  n.  Doubling  the  number  of  tests,  therefore,  reduces 
the  standard  error  of  a  parameter  such  as  the  mean  or  standard  deviation  by 
about  1/  2.  The  benefit  of  increased  testing  displays  marginally  diminishing 
returns. 

Error  in  the  Mean 

From  Dqn.  16,  the  variance  of  the  statistical  error  of  the  mean  of  a 
population  is  approximately, 

1  1  1 


If  repeated  samples  of  n  tests  frori  the  sane  soil  deposit  are  nade,  if  each  of 
the  tests  is  statistically  independent  of  all  others,  and  if  for  each  sample 
the  mean  is  calculated,  then  the  variability  of  those  means  would  have  variance 
Vx/n. 

Settlement  of  footings  on  cohesionless  soils  is  often  estimated  to  depend 
inversely  on  the  average  SPT  blow  count  immediately  beneath  the  footing  as,  for 
example,  throuqh  an  equation  of  the  form  of  Eqn.  41,  If  only  one  SPT  test  is 
taken  beneath  the  footing,  the  variance  of  the  average  N  from  one  footing  to 
another  is,  obviously,  VN.  If  more  than  one  test  is  made  and  the  results 
averaged,  then  the  variance  among  the  averages  decreases,  as  can  be  seen  in 
Fig,  41.  As  the  number  of  tests  n  increases,  this  variance  reduces  as  1/n. 

This  sampling  variance  of  the  estimate  of  the  mean  is  not  the  uncertainty 
of  the  estimate  directly,  but  the  variation  one  might  expect  to  see  in 
repeated  sampling  from  the  same  deposit.  Nevertheless,  under  fairly  general 
conditions  this  variance  is  close  or  identical  to  the  so-called  'Bayesian' 
variance  of  the  parameter  which  expresses  the  uncertainty  directly.* 

Eqn.  56  refers  to  the  case  in  which  measurements  are  statistically 
independent  of  one  another.  When  the  measurements  are  not  independent,  Eqn.  56 

*  More  precisely,  the  posterior  variance  on  mx  in  a  Bayesian  sense  is  Vx/n ,  if 
the  prior  distribution  on  mx  is  uniform  and  Vx  is  known.  If  Vx  is  unknown  and 
the  prior  distribution  on  (mx,sx)  is  noninforma t ive  (  1/sx),  then  the  marginal 

posterior  variance  on  mx  is  somewhat  larger. 


must  be  modified.  The  most  common  case  in  which  measurements  are  dependent 


occurs  when  the  spacings  among  the  measurements  are  small,  so  that  auto¬ 


correlation  comes  into  effect.  From  Eqn.  15,  the  variance  of  mx  accounting  for 


dependence  among  the  measurement  is 


V  =  —  C 

mx  2  x<  ,x. 

n  1  - 


in  which  Cx^,xj  =  covariance  between  the  measurements  x^  and  Xj.  The 


individual  covariances  can  be  estimated  from  the  autocovariance  function 


evaluated  at  the  appropriate  separation  distance.  For  computer  applications  a 


more  convenient  matrix  version  of  Eqn.  57  is 


n  *=x  n 


in  which  1/n  is  a  vector  of  dimension  n,  each  element  of  which  is  1/n,  and 


is  the  covariance  matrix  of  the  observations.  The  ijth  element  of  C,.  is 


Cxi,xj-  if  the  measurements  are  widely  spaced,  Eqn.  58  reduces  to  Eqn.  56.  In 


fact,  in  most  practical  applications  Eqn.  56  is  used  unless  the  measurements 


are  made  very  close  together  in  space. 


Error  in  the  Standard  Deviation 


The  variance  of  a  soil  property  is  usually  estimated  by  the  sample 


variance , 


5  —  .  (xi-nO 

x  n-1  1  x 


This  is  an  unbiased  estimate  of  Vx  the  soil 
sampling  variability  characterized  by, 

2  V 

x 


Eqn.  60  is  exact  when  the  data  are  Normally  distributed,  but  only  approximate 
otherwise.  The  uncertainty  in  the  standard  deviation  of  x  is  characterized 
approximately  by  the  standard  deviation. 

ss  Sx/  Tn  .  (61 ) 

x 

Again,  Eqn.  61  is  exact  for  Normally  distributed  x^ ,  but  only  approximate 
otherwise.  More  detail  is  provided  by  Duncan  (1974).  For  most  purposes  the 
uncertainty  in  sx  can  be  ignored  in  developing  a  design  profile.  For  example, 
the  sample  variance  of  the  data  of  Fig.  32  is  about  (lOkPa)  with  a  sample  size 
of  n=40  at  any  elevation.  Thus,  the  standard  deviation  of  sx  from  these  data 
is  approximately  1.1  kPa. 


property  variance,  and  has 


(60) 


Error  in  Regression  Coefficients 

The  estimates  of  slope  and  intercept  coefficients  in  regression  analysis 
are  mathematically  defined  functions  of  the  measurements  from  which  they  are 
inferred  (i.e.,  =  xi •  • • • •  xn ) >  and  thus  the  statistical  error  in  these 

estimates  can  be  calculated  to  a  first-order  approximation  by  methods  given  in 
Part  II, 


-vu.  .  xi ' 
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Figure  39.  Bjerrum's  Correction  Factor  for  Field  Vane  Strength 
Measurements  in  Soft  Clay. 


PART  V:  CONSTRUCTING  A  STATISTICAL  SOIL  ENGINEERING  PROFILE 


This  final  part  presents  the  procedure  for  combining  best  estimates  of 
soil  properties  and  uncertainty  about  those  estimates  into  a  statistical  soil 
property  profile. 

The  design  profile  summarizes  available  information  on  the  variation  of 
soil  properties  with  depth.  Specifically,  the  design  profile  gives, 

A  best  estimate  of  soil  properties  with  depth,  and 
Uncertainty  envelopes  about  the  best  estimate. 

These  envelopes  show  the  magnitudes  of  two  types  of  uncertainty  in  the  soil 
property  estimates.  The  first  set  of  envelopes  shows  spatial  variability  of 
soil  properties  about  their  mean.  The  second  set  shows  uncertainty  or  error  in 
the  mean  itself.  Each  set  of  envelopes  shows  a  +/-  one  standard  deviation 
interval. 


Decomposition  of  Uncertainty 

The  methodology  presented  in  this  report  is  based  on  a  decomposition  of 
uncertainty  in  soil  property  estimates.  In  a  statistical  profile,  the  sources 
of  uncertainty  which  have  been  analyzed  and  quantified  separately  are  now 
brought  back  together. 

Uncertainty  in  soil  property  estimates  have  been  divided  into  four 
components:  (i)  real  (spatial)  variability  of  the  soil  deposit,  (ii)  random 

measurement  noise,  (iii)  statistical  estimation  error,  and  (iv)  measurement  or 
model  bias  (Fig.  44).  The  overall  error  in  an  estimate  of  soil  properties  at 
any  one  point  in  the  soil  profile  is  found  by  combining  the  individual 
contributions  of  the  four  sources. 


121 


The  contributions  are  mathematically  combined  by  taking  advantage  of  a 
convenient  result  from  probability  theory,  that  the  variances  (i.e.,  the 
squares  of  the  standard  deviations)  of  the  individual  contributions  are 
additive  (Cf.,  Eqn.  15), 


V 


x 


Mata  scatter 


+ 


^systematic  error 


vspatial  variation 
+ 

Measurement  noise 


Statistical  error 
+ 

Measurement  bias* 


(67) 


vx  =  Spatial  variation 

+  ^measurement  noise 
+  vstatistical  error 
+  Measurement  bias* 

in  which  Vx  =  the  total  uncertainty  in  an  estimate  or  prediction  of  soil 
property  x,  expressed  as  a  variance. 

In  separating  spatial  variability  and  systematic  error,  it  is  easiest  to 
think  of  spatial  variation  as  scatter  about  the  trend  and  to  think  of 
systematic  error  as  uncertainty  on  the  trend  itself.  The  first  envelope 
reflects  soil  variability  after  random  measurement  error  is  removed.  The 
second  envelope  reflects  statistical  error  and  measurement  bias. 

Rearranging  Eqn.  66,  the  variance  of  a  soil  property  x  is  related  to  the 


variances  of  data  scatter,  measurement  error  and  measurement  bias  by, 


The  additional  uncertainty  contributed  by  statistical  error  in  mx  adds  to  the 


right-hand-side  (RHS)  of  Eqn.  69  the  term  Vmx  of  Eqn.  56.  The  total  variance 
in  point  to  point  values  of  x  is  thus, 


V 

x 


2  n2 

+  m  Q 

X  B 


(70) 


The  first  term  on  the  RHS  is  the  contribution  of  spatial  variation  to  Vx.  The 
second  term  is  the  contribution  of  uncertainty  in  measurement  bias.  The  third 
term  is  the  contribution  of  statistical  error.  Taken  together,  the  second  and 
third  term  are  the  systematic  error  in  x,  or  the  error  on  the  mean  value.  The 
first  term  is  the  additional  uncertainty  due  to  variation  of  the  soil  from  one 
location  to  another. 

Note  that  the  contribution  of  random  measurement  error  Ve  appears  only  in 
its  effect  on  statistical  error.  Thus,  in  specific  instances — e.g.,  if  VB  is 
small  and  n  is  large--the  variance  in  x,  Vx ,  can  be  much  less  than  the  data 
scatter  variance  Vz. 


Simple  Soil  Profile:  Field  Vane  Data 
This  example  illustrates  the  construction  of  a  design  profile  for  the 
case  in  which  in  situ  test  results  are  used  directly  to  estimate  soil 
properties . 


Site  Conditions 


The  facility  was  a  long  water  retaining  enbanknent  constructed  on 
approximately  20m  of  soft  marine  and  lacustrine  clays.  Field  vane  data  were 
collected  at  every  1m  of  depth  in  27  borings  (Fig.  12) ,  and  were  scattered. 

The  scatter  in  the  data  varied  with  depth,  but  had  a  coefficient  of  variation 
ranging  from  18  to  45%. 

Horizontal  and  vertical  autocova riance  functions  for  the  Marine  clay  are 
shown  in  Figs.  33a  and  33b.  Extrapolations  to  the  origin  indicate  that  about 
40%  of  the  data  scatter  variance  in  the  marine  clay  can  be  attributed  to  noise, 
however,  little  of  the  scatter  in  the  lacustrine  clay  appears  to  be  noise. 

This  difference  may  be  due  to  small  scale  variability  of  the  marine  clay  rather 
than  measurement  error,  or  may  be  due  to  other  differences  between  the  two 
clays,  as  e.g.,  in  plasticity  index  or  sensitivity.  The  resulting  separation 
of  data  scatter  expressed  as  coefficients  of  variation  is  given  in  Table  3. 

Systematic  Error 

Systematic  uncertainty  on  the  mean  strength  derives  from  two  sources, 
statistical  error  due  to  limited  numbers  of  tests,  and  measurement  bias  due  to 
differences  between  the  field  vane  strength  and  the  actual  strength  mobilized 
in  embankment  failures.  Statistical  error  can  be  calculated  approximately  as 
Eg n.  58,  which  assumes  the  tests  to  be  independent.  Given  the  separation  of 
the  tests  is  larger  than  the  autocova r iance  distances,  this  assumption  seemed 
sat isfactory . 

Field  vane  correction  factors,  ,  were  used  to  account  for  measurement 
bias.  These  were  estimated  starting  from  Rjerrum's  chart,  Fiq .  43,  and  back 
calculating  strengths  from  local  dyke  failures.  Uncertainty  in  the  correction 
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factors  were  estimated  by  judgement  and  inspection  as  shown  in  Table  3.  Due  to 
a  lack  of  laboratory  strength  and  consolidation  data  at  depth  a  site  specific  u 
was  not  developed  for  the  lacustrine  clay. 


Statistical  Soil  Engineering  Profile 


The  resulting  statistical  soil  engineering  profile  is  shown  in  Fig.  45. 
The  best  estimate  undrained  strength  with  depth  is  the  mean  undrained  strength 
The  inner  envelopes  show  ±  one  standard  deviation  due  to  spatial  variation. 

The  total  uncertainty  in  the  value  of  undrained  strength  at  any  point, 
expressed  as  a  variance  is  found  by  adding  the  variance  due  to  error  on  the 
mean  to  the  variance  due  to  spatial  variation.  A  standard  deviation  envelope 
on  the  total  uncertainty  in  estimating  soil  properties  at  a  point  is  found, 
correspondingly,  by  taking  the  square  root  of  the  sum  of  the  squared  standard 
deviation  envelope  on  error  in  the  mean  and  spatial  variation. 

In  Part  III,  a  size  effect  factor  R  was  introduced  to  account  for  the 
averaging  out  of  spatial  variation  in  a  large  volume  of  soil.  For  design  use 
this  size  effect  factor  R  is  applied  to  the  spatial  variability  part  of  the 
soil  property  uncertainty.  The  uncertainty  in  average  soil  properties  in  such 
a  volume  of  soil  is  found  by  reducing  the  spatial  variance  contribution  by  the 
factor  R,  and  adding  this  to  the  variance  in  the  mean.  This  has  been  done  for 
the  profile  of  Fig.  45  to  obtain  Fig.  46.  This  figure  shown  the  best  estimate 
(mean)  profile  with  t  one  standard  deviation  envelope  appropriate  to  different 
size  failure  surfaces  through  the  clay  for  the  purpose  of  limit  equilibrium 
stability  analysis.  These  envelopes  are  obtained  as, 
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In  design,  the  statistical  soil  engineering  profile  of  Fig.  46  is  the  starting 


point  for  error  analysis,  as  described  below. 


Simple  Soil  Profile:  SPT  Data 


The  second  example  illustrates  a  case  similar  to  the  first,  except  that 


the  site  lies  on  silty-sand  alluvium  which  was  characterized  by  standard 


penetration  testing. 


Site  Conditions  and  Data  Scatter 


The  facility  was  a  low  water-retaining  rockfill  embankment  asscciatd  with 


a  large  multiple-use  water  resource  project.  The  foundation  profile  consisted 


of  approximately  25  feet  of  alluvium  in  which  a  large  number  of  borings  were 


made  (Fig.  47).  The  horizontal  sample  autocorrelation  function  for  the  SPT 


data,  shown  in  Fig.  48,  indicated  little  measurement  noise.  The  supposition 


was  that  lack  of  significant  noise  in  the  data  was  due  to  the  looseness  of  the 


soil  and  the  low  average  blow  count.  The  data  scatter  varied  somewhat  with 


depth,  giving  a  coefficient  of  variation  of  about  32%. 


Systematic  Frror 


Because  the  SPT  data  are  used  directly,  that  is,  they  are  not  translated 


into  a  fundamental  soil  property  such  as  strength  or  de forma  hi lity ,  no 


measurement  bias  term  was  used  in  developing  a  statistical  soil  property 


profile.  The  profile  is  expressed  directly  as  SPT  results.  The  statistical 


error  in  the  estimate  of  the  mean  SPT  blow  count  at  any  depth  interval  was 


calcuated  a  per  Eqn.  56.  This  is  shown  in  Plate  5. 

Statistical  Soil  Engineering  Profile 

The  resulting  statistical  soil  engineering  profile  is  shown  in  Pig.  49. 
This  profile  was  constructed  according  to  Eqn.  70.  The  vertical  bar  in  each 
case  shows  mean  water  elevation  plus  or  minus  one  standard  deviation  of  spatial 
and  temporal  variability. 


Derived  Soil  Profiles 

The  foregoing  case  illustrates  the  construction  of  a  design  profile  for 
calculations  directly  relating  field  measurements  to  model  parameters.  Not 
all  situations  are  direct  in  this  way.  Many  involve  profiles  derived  from 
field  measurements,  as  for  example,  when  using  normalized  soil  properties 
(e.q.,  the  SHANSEP  approach  of  Dadd  and  Foott,  1974).  Such  a  derived 
soil  profile  was  used  in  analyzing  an  ore  stockpile  on  soft  Gulf  of  Mexico 
Clay . 


Site  Conditions 

The  facility  was  ari  industrial  plant  sited  along  a  barge  canal  on  15  n  of 
normally  consolidated  clay.  Ores  for  processing  are  shipped  up  the  canal  and 
stockpiled  next  to  a  dock.  strength  data  for  the  site  taken  by  field  vane 
testing  are  scattered,  as  are  maximum  past  pressure  measurements  (Fig .  50). 
This  leads  to  uncertainty  in  factors  of  safety  against  strength  instability. 
The  uncertainty  or.  factor  of  safety,  in  turn,  leads  to  uncertainty  on  how  high 
t  h  f?  ~  t  o  r  V  t  >  i  cy 
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can  be  built  before  strength  increases  fron  consolidation  are  required  to 


provide  strength  stability. 


Normalized  Soil  Properties  ( SHANSEP ) 


The  field  vane  data  are  too  few  and  too  widely  spaced  and  too  scattered  to 
confidently  estimate  soil  properties.  Therefore,  the  decision  was  made  to  base 
stability  predictions  on  normalized  soil  properties,  and  to  determine  the 
calibrating  constants  from  measurements  made  in  the  laboratory. 

The  RHANSEP  procedure  was  adopted  which  relates  undrained  strength  cu  to 
in  situ  stress  through  the  equation, 


—  =  k  [f  / J ■  ]q 

1  '  vm  vo 1 


in  which  d1  =  effective  vertical  stress,  o'  =  maximum  past  pressure,  and 
vo  vm 


^  ~  ■  cu/7'vrn  ^normally  consolidated 


k  is  the  undrained  strength  ratio  for  normally  consolidated  clay.  The 

parameters  k  and  q  are  considered  material  constants,  and  [o'  /O'  ]  =  OCR  is 

vm  vo 

the  over-consolidation  ratio. 

Applying  Egn.  19, 
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providing  a  linear  composition  of  the  uncertainties  on  each  of  the  three  soil 


parameters,  k,q,  and  a1  . 

vm 

Soil  Data 

One  dimensional  consolidation  tests  were  made  on  specimens  recovered  in 
piston  samples  at  the  site.  Maximum  past  pressure  estimates  avtn'  from  these 
tests  are  shown  in  Fig.  11.  The  trend  of  with  depth  was  approximated  by 

fittinq  a  regression  line  to  the  data  using  Eqns.  20,  21,  and  22.  The  least 
squares  fit  is  shown  in  Fig.  11.  The  data  scatter  about  the  regression  line 
was  estimated  using  Eqn.  23  to  be  sx=1ksf. 

Laboratory  direct  simple  shear  tests  were  performed  to  determine  the  soil 
parameters  k  and  q  for  the  undrained  strength  model  of  Eqn.  72.  The  results  of 
these  tests  are  shown  in  Fig.  51.  From  the  test  results  and  judgemental 
interpretation  the  best  estimates  and  standard  deviations  of  q  and  k  were 
concluded  to  be 


nq  =  0.86  sq  =  0.0  3 

mk  =  0.21  sk  =  0. 22 


(75) 


Because  the  measurements  of  q  and  k  were  made  with  care  in  the 
laboratory,  and  because  too  few  data  were  available  to  establish  the  structure 
of  spatial  variation  in  an  autocorrelation  function,  measurement  noise  was 
assumed  *o  be  zero.  That  is,  the  assumption  was  made  that  Vp  =  0  for  the  soil 
parameter  estimate  q  and  k.  This  assumption  is  conservative  in  that 
uncertaintv  is  over  estimated,  but  the  extent  of  conservatism  was  thought  to  be 
small.  nf  .  uiiso  of  limited  data  on  7vm'  ,  the  same  assunpt’  >n  that  Vp  =  0  was  made 
for  estimates  of  maximum  past  pressure. 


Svstenatic  Errors 


Statistical  errors  in  a  and  k  were  estimated  from  Eqn.  56.  There  is 
slight  correlation  in  the  estimates  of  q  and  k,  because  q  is  measured  by  the 
increase  of  normalized  strength  cu/a*  with  OCR,  startinq  from  k  at  OCR=1.0. 
This  correlation  turned  out  to  be  small  and  was  neglected.  Statistical  error 
in  the  trend  of  rJvm'  with  depth  was  estimated  from  the  regression  analysis 
using  Eqn.  65.  Plus/minus  one  standard  deviation  envelope  on  the  mean  of  ovm' 
with  depth  are  shown  in  Fig.  11.  Measurement  and  model  bias  errors  were 
estimated  subjectively,  based  on  experience  with  the  SHANSEP  procedure  and  on 
the  quality  of  the  laboratory  testinq  program. 

Statistical  Soil  Engineering  Profile 

The  resultinq  statistical  soil  engineering  profile  is  shov/n  in  Fig.  52. 
The  best  estimate  of  undrained  strenqth  with  depth  is  the  mean.  The  inner 
envelopes  show  plus  or  minus  one  standard  deviaton  of  error  on  the  best 
estimate  or  mean.  The  outer  envelopes  show  plus  or  minus  one  standard 
deviation  of  the  spatial  variation  in  undrained  strenqth  about  the  mean  trend. 
The  statistical  profile  was  developed  from  Eqn.  74  by  separately  estimating 
spatial  and  systematic  components  for  each  of  the  three  terms  on  the  RHS , 
corresponding  respectively  to  uncertainties  in  k,  ’ ,  and  q  (Plate  6).  The 

three  spatial  contributions  were  added  to  get  the  total  spatial  variability, 
and  the  three  systematic  terms  were  added  to  get  the  total  systematic  error. 

In  essence,  Eqn.  74  and  the  division  of  uncertainty  into  component  types 
provides  an  accounting  format  for  keeping  track  of  where  uncertainties  or 
errors  originate  and  how  they  logically  combine. 


In  this  project  too  few  data  were  available  to  confidently  assess 
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autocorrelation  functions  from  field  data.  As  a  result,  the  size  effect 
summarized  by  the  factor  R  could  not  be  precisely  quantified.  Therefore,  the 
final  statistical  profile  shows  only  the  limiting  cases  of  spatial  averaging: 
the  case  of  very  small  failure  surfaces  for  which  R=1.0,  and  the  case  of  very 
large  failure  surfaces  for  which  R*0.  If  a  subsequent  error  analysis  shows 
that  this  range  of  uncertiantiy  is  too  large  to  be  dealt  with  in  design,  then 
more  data  would  have  to  be  gathered. 


Error  Analysis 

The  end  result  of  the  statistical  data  analysis  presented  in  this  report 
is  a  statistical  soil  profile  summarizing  data  scatter  and  estimates  of 
systematic  error.  The  design  profile  gives  a  best  estimate  of  soil  properties 
with  depth  and  two  sets  of  standard  deviation  envelopes,  one  on  the  mean  and 
one  on  spatial  variation. 

The  next  step  is  to  incorporate  this  statistical  characterization  of  soil 
property  information  in  design  calculations.  That  is,  to  use  means,  standard 
deviations,  and  correlations  of  soil  properties  as  the  input  to  geotechnical 
modeling.  The  result  of  that  modeling  is  a  best  estimate  or  mean  prediction  of 
engineering  performance,  accompanied  by  a  standard  deviation  on  the  prediction. 
The  techniques  for  accomplishing  this  are  presented  in  the  companion  report, 
"Error  analysis  for  geotechnical  engineering,"  (Contract  Report  ■ ;  I  7  —  .J  )  . 
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Table  3 


Sunnary  of  Parameter  Estimates  for  Error  Analysis  of  Fnd-of- 
Con st ruction  Stability  Analyses  for  An  Embankment  on  Soft  Clay. 


Field  Vane  Statistics 


Mean,  kPa 
Data  Scatter, 

Spatial  Variability,  f2x 


Ma  r ine 
34.5 
0.236 
0.183 
0.  149 


Lacustrine 

31.2 

0.272 

0.272 

0.000 


Systematic  Error 

Statistical,  7nx 
Correction  factor,  ^ 
TOTAL  Dias,  '2r-x+M 


0.030 

0.075 

0.08 


0.045 

0.1  5 

0.  16 


TOTAL 


:x, TOTAL 


Table  4 

Soil  Profile  Uncertainties  for  Error  Analysis 
Variable  Expected  Value  Variance 


Expected  Value  Variance 

Spatial  Systematic  TOTAL 


depth  of  crust 
depth  to  till 
fill  density 


4n 

18.5m 
20  kN/m^ 


0.96 

0.0  36 

1 .0 

0.0 

1.0 

1.0 

1.0 

1.0 

2.0 
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PLATE  5 


|  SUBJECT:  Statistical  Soil  Engineering  Profile  for  SPT  Da 


j  DESIGN  PROFILE: 


(1)  DATA  SCATTER:  SPT 

Station  4  +  00—13  +  00  17+00—24  +  50  25+00 — 32+00 


mean  (bpf) 

4.8 

6.9 

standard  deviation 

2.9 

2.8 

coefficient  of 

va  r  i  a  t  i  on 

0.60 

0.41 

Measurement  Noise 
(From  Figures  4.6,  4.7) 

— 

— 

Spatial  Variability 
/v[x]  =  / (V[ z ] -V[ e ]  ) 

2.9 

2.8 

I 

I 

I 


PLATE  5 


Paqe  2/2 


(2)  SYSTEMATIC  ERROR 


Station  4+00 

'--13  +  00 

1 7+0O--24+50 

p  r,  *  ^  r  ..T  ■"  ' 

1 

number  measurements* 
per  depth  interval 

14 

1  1 

2n  1 

1 

1 

Statistical  Error 
/v[mx]  =  /v[ z]/n 

0.7B 

0.84 

0.99  | 

! 

1 

Model  Bias 

n/a 

n/a 

n/a  | 

1 

Total  Systematic  Error 

0.78 

0.84 

0.98  | 

*  (varies  with  depth,  numbers  are  representative) 


(3)  DESIGN  PROFILE 

(Shown  as  Figure  49) 
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Figure  44.  Sources  of  Error  or  Uncertainty  in  Soil  Property  Estimates 
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In  Situ  Undrained  Strength  Data  for  a  Site  on  dulf  of  Mexico 
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Laboratory  Test.  Results  to  Determine  SHANSEP  Strength 
Parameters.  From  Noirav,  1982. 
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APPENDIX  A:  STATISTICAL  CONSIDERATIONS  IN  ESTIMATING  AUTOCOVARIANCE 


ESTIMATION  OF  AUTOCOVARIANCE  FUNCTIONS 
This  appendix  briefly  discusses  alternative  statistical  approaches  to 
estimating  autocovariance  functions  from  soil  data.  Detailed  presentation  of 
mathematical  procedures  and  statistical  properties  of  the  techniques  are 
presented  in  Spikula  (1983)  and  DeOroot  (1989). 

Three  techniques  are  commonly  used  to  estimate  autocovariance  functions  in 
the  analysis  of  site  characterizat ion  data:  the  moment  estimator,  the  BLUE 
minimization  estimator,  and  the  maximum  likelihood  estimator.  These  have 
different  strengths  and  weaknesses,  and  may  lead  to  slightly  differing  results. 


Moment  estimator 

The  moment  estimator  uses  the  autocovariance 
from  the  observed  measurements  as  an  estimator  of 
underlying  spatial  process: 


function  calculated  directly 
the  autocovariance  of  the 


czm  =  ( — ~ 

n  - 1 


Z  (zi-mz)  (zi+s-mz) 


( A1  ) 


in  which  n5  =  the  number  of  data  pairs  at  separation  distance  <V 


BLUE  minimization  estimator 

The  BLUE  minimization  uses  the  autocorrelation  function  that  minimizes  the 
squared  error  between  estimated  and  observed  soil  properties  at  the  measurement 
points  as  an  estimate  of  the  autocovariance  of  the  underlying  spatial  process. 
That  is,  soil  properties  are  estimated  at  each  of  the  observed  points  by 
removing  that  measurement  from  the  data  base  and  using  the  remaining  ( n  —  1  )  data 
to  estimate  it  using  a  best  linear  unbiased  estimation  (BLUE)  technique 


A 1 


(Gpikula,  1983).  That  aut ocova r iance  function  which  minimizes  the  variance  of 
the  error  between  observed  and  predicted  measurener. ts  is  taken  as  the  estimate. 
This  is  a  parametric  model  in  that  the  mathematical  shape  of  the  autocovariance 
function  must  be  specified. 

Maximum  Likelihood  Estimator 

The  maximum  likelihood  estimator  uses  the  autocorrelation  function  that 
maximizes  the  conditional  probability  of  the  measurements  actually  made  (i.e.  , 
the  'likelihood')  as  the  estimator  of  the  autocovariance  of  the  underlying 
spatial  process, 

Cz(r)  3.t.:  min  bfz-| . zn]  =  min  MN(_3x,  )  (A.  2) 

C,(r)  Cz(r) 

in  which  I,[_z]  =  the  likelihood  or  conditional  probability  of  the  vector  of  data 
z,  MN  (  )  =  the  mul  t  iN'orma  1  probability  density  function,  8  =  a  vector  of 
regression  coefficients  for  the  mean  trend  of  the  data,  x  =  the  matrix  of 
location  coefficients  each  row  of  which  is  < 1 , x^  ,  x^  -  ,  x ^ ^  ,  .  .  .  ,  x^ >  where  k  is 
the  order  of  the  regression  surface,  and  ft,  =  the  covariance  matrix  of  the 
observations  calculated  via  the  autocovariance  function  (DeGroot,  1985).  This 
is  also  a  parametric  model. 

Comparison  of  Estimation  Techniques 

The  rnoment  estimator  technique  is  by  far  the  most  commonly  used  approach 
in  present  (1985)  practice,  but  it  has  statistical  limitations.  The  advantages 
of  the  moment  approach  are  that  it  is  mathematically  and  conceptually  easy  to 
use,  and  that  it  requires  relatively  modest  computations.  The  disadvantages 
are  that  it  is  statistically  biased  and  inefficient,  and  it  is  difficult  to  use 


when  data  are  not  sampled  on  uniform  grids. 


The  BLUE  estimator  technique  has  not  been  widely  used  in  geotechnical 
engineering,  but  it  is  common  in  mining  engineering  and  'geostatistics.1  Its 
principal  advantages  are  that  it  is  more  flexible  than  the  moment  estimator  in 
making  use  of  non-unif ormly  sampled  data,  and  it  requires  less  intuitive  input. 
Its  principal  disadvantages  are  that  it  is  computationally  intensive  and  its 
statistical  properties  are  poorly  studied. 

The  maximum  likelihood  estimator  is  not  widely  used  in  either  geotechnical 
engineering  o-.  mining,  but  it  is  increasingly  common  in  other  areas  of  statis¬ 
tical  data  processing  (e.g.,  in  time  series  analysis  and  signal  processing). 

Its  major  advantage  is  that  its  statistical  properties  are  well  known  and 
desirable  (e.g.,  it  is  asymptotically  unbiased  and  efficient),  and  it  easily 
accommodates  non-unif ormly  sampled  data.  Its  major  disadvantage  is  that  the 
computational  algorithms  required  to  use  the  method  are  complicated,  although 
not  intensive  of  computer  time.  This  disadvantage  can  be  overcome  using 
packaged  programs. 

Packaged  computer  programs  are  available  for  each  of  the  three  methods  of 
estimating  autocorrelation  functions.  Most  can  be  tailored  to  run  on  present 


microcomputer  s. 


APPENDIX  B:  SYMBOL  LIST 


a,b  =  regression  coefficients 

=  constant 

B  =  measurement  bias  correction  coefficient 

b  =  footing  width 

Cf  =  cost  of  failure 

CR  =  risk  cost 

Cx(6  )  =  autocovariance  function  for  separation  distance  <5 

Cx  y  =  covariance  of  x  and  y 
=  covariance  matrix 
CRec  =  virgin  compression  ratio 

CRer  =  recompression  ratio 

cu  =  undrained  strength 

d  =  embedment  depth  of  footing 

D,d  =  geometric  properties  of  scatter  graph 
e  =  random  measurement  error 

f^  =  cumulative  frequency  of  observation  i 

E  =  elastic  modulus 

F  =  factor  of  safety 

FV  =  field  vane 

G  =  matrix  of  derivatives  with  ijth  element  dy^/dxj 

g (x)  =  deterministic  function  of  x 

H  =  horizontal  load 

H,h  =  geometric  properties  of  scatter  graph 

Hi  =  stratum  thickness 

h  =  5HANSEP  strength  parameter 

i  =  dilation  angle 

k  =  counter  number 

mx  =  mean  of  x 

n  =  number  of  measurements 

L  =  length 

L[z]  =  likelihood  of  z 

mv  =  vertical  compression  coefficient 

N  =  SPT  blow  count 

Ny  =  bearing  capacity  factor 

OCR  =  overconsolidation  ratio 

PBC  =  probability  of  bearing  capacity  failure 

Pf  =  probability  of  failure 

Pp  =  probability  of  excessive  settlement 

Pr{.}  =  probability  of 

q  =  SHANSEP  strength  parameter 

q  =  applied  footing  stress 

qvo  =  design  stress 

qv  =  bearing  capacity 

rXy  =  correlation  coefficient  of  xy 

rQ  =  autocorrelation  distance,  C_(r0)=1/e 

R  =  size  effect  factor 

Rx(5)  =  autocorrelation  function  over  separation  distance  6 

sx  =  standard  deviation  of  x 
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APPENDIX  B:  SYMBOL  LIST 

! 

(continued) 

C 

v:  t 

_ 

Student's  t  statistic 

•  ti 

= 

trend 

5 

= 

residual  variation  about  regression  line 

1 

= 

vertical  load 

= 

variance  of  x 

\  wx 

= 

range  of  x 

;■  x 

= 

soil  property 

\  X 

= 

vector  of  data  x-| . xn 

J  xi . 

= 

i*-h  measurement  of  property  x,  or  x  at  location  i 

|  xmax 

= 

largest  value  of  x 

.  xmin 

= 

smallest  value  of  x 

x0 . 25 

= 

25th  fractile  of  x 

y  x0 . 5 

= 

50th  fractile  of  x 

t  x0 .75 

= 

75th  fractile  of  x 

s  y 

= 

predicted  performance  variable 

k. 

Vo 

= 

design  specification  on  variable  y 

r*  z 

■*> 

= 

measured  soil  property,  depth 

a 

= 

critical  probability  level 

B 

= 

reliability  index 

‘  3 

= 

vector  of  regression  coefficients 

(  7 

= 

soil  density 

= 

separation  distance 

°o 

= 

autocorrelation  distance 

.*  e 

i 

= 

strain 

n 

= 

point  of  expansion  in  Taylor's  series 

2  9 

= 

slope  angle 

I  u 

= 

Bjerrum's  FV  correction  factor 

= 

degrees  of  freedom 

^  p 

= 

settlement 

\  a 

= 

stress 

<  avm‘ 

= 

maximum  past  pressure 

b  avo' 

= 

effective  vertical  stress 

»vf' 

= 

final  consolidation  stress 

*  <j) ' 

= 

effective  stress  friction  angle 

= 

coefficient  of  variation 

ft  W 

= 

outlier  test  statistic 
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